OPTIMALLY SPARSE APPROXIMATIONS OF 3D FUNCTIONS BY 
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Abstract. We study efficient and reliable methods of capturing and sparsely representing aniso- 
tropic structures in 3D data. As a model class for multidimensional data with anisotropic features, 
we introduce generalized three-dimensional cartoon-like images. This function class will have two 
smoothness parameters: one parameter f) controlling classical smoothness and one parameter a 
controlling anisotropic smoothness. The class then consists of piecewise C'-smooth functions with 
discontinuities on a piecewise C"-smooth surface. We introduce a pyramid-adapted, hybrid shcarlet 
system for the three-dimensional setting and construct frames for L'^{]S.'^) with this particular shcarlet 
structure. For the smoothness range 1 < a < S < 2 we show that pyramid-adapted shearlet systems 
provide a nearly optimally sparse approximation rate within the generalized cartoon-like image model 
class measured by means of non-linear A^-term approximations. 
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1. Introduction. Recent advances in modern technology have created a new 
world of huge, multi-dimensional data. In biomedical imaging, seismic imaging, as- 
tronomical imaging, computer vision, and video processing, the capabilities of mod- 
ern computers and high-precision measuring devices have generated 2D, 3D and even 
higher dimensional data sets of sizes that were infeasible just a few years ago. The 
need to efficiently handle such diverse types and huge amounts of data has initiated 
an intense study in developing efficient multivariate encoding methodologies in the 
applied harmonic analysis research community. In neuro-imaging, e.g., fluorescence 
microscopy scans of living cells, the discontinuity curves and surfaces of the data are 
important specific features since one often wants to distinguish between the image "ob- 
jects" and the "background", e.g., to distinguish actin filaments in eukaryotic cells; that 
is, it is important to precisely capture the edges of these ID and 2D structures. This 
specific application is an illustration that important classes of multivariate problems 
are governed by anisotropic features. The anisotropic structures can be distinguished 
by location and orientation or direction which indicates that our way of analyzing 
and representing the data should capture not only location, but also directional infor- 
mation. This is exactly the idea behind so-called directional representation systems 
which by now are well developed and understood for the 2D setting. Since much of 
the data acquired in, e.g.. neuro-imaging, are truly three-dimensional, analyzing such 
data should be performed by three-dimensional directional representation systems. 
Hence, in this paper, we therefore aim for the 3D setting. 

In applied harmonic analysis the data is typically modeled in a continuum setting 
as square-integrable functions or distributions. In dimension two, to analyze the 
ability of representation systems to reliably capture and sparsely represent anisotropic 
structures, Candes and Donoho [7] introduced the model situation of so-called cartoon- 
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like images, i.e., two-dimensional functions which are piecewise C^-smooth apart from 
a piecewise discontinuity curve. Within this model class there is an optimal sparse 
approximation rate one can obtain for a large class of non-adaptive and adaptive 
representation systems. Intuitively, one should think adaptive systems would be far 
superior in this task, but it has been shown in recent years that non-adaptive methods 
using curvelets, contourlets, and shearlets all have the ability to essentially optimal 
sparsely approximate cartoon-like images in 2D measured by the L^-error of the best 
A''-term approximation [7,13,17,24]. 

1.1. Dimension three. In the present paper we will consider sparse approxima- 
tions of cartoon-like images using shearlets in dimension three. The step from the one- 
dimensional setting to the two-dimensional setting is necessary for the appearance of 
anisotropic features at all. When further passing from the two-dimensional setting to 
the three-dimensional setting, the complexity of anisotropic structures changes signifi- 
cantly. In 2D one "only" has to handle one type of anisotropic features, namely curves, 
whereas in 3D one has to handle two geometrically very different anisotropic struc- 
tures: Curves as one-dimensional features and surfaces as two-dimensional anisotropic 
features. Moreover, the analysis of sparse approximations in dimension two depends 
heavily on reducing the analysis to affine subspaces of M^. Clearly, these subspaces 
always have dimension and co-dimension one in 2D. In dimension three, however, 
we have subspaces of co-dimension one and two, and one therefore needs to perform 
the analysis on subspaces of the "correct" co-dimension. Therefore, the 3D analysis 
requires fundamental new ideas. 

Finally, we remark that even though the present paper only deals with the con- 
struction of shearlet frames for and sparse approximations of such, it also 
illustrates how many of the problems that arises when passing to higher dimensions 
can be handled. Hence, once it is known how to handle anisotropic features of differ- 
ent dimensions in 3D. the step from 3D to 4D can be dealt with in a similar way as 
also the extension to even higher dimensions. Therefore the extension of the presented 
result in i^(R^) to higher dimensions L^(R") should be, if not straightforward, then 
at least be achievable by the methodologies developed. 

1.2. Modelling anisotropic features. The class of 2D cartoon-like images 
consists, as mentioned above, of piecewise C^-smooth functions with discontinuities 
on a piecewise C^-smooth curve, and this class has been investigated in a number of 

recent publications. The obvious extension to the 3D setting is to consider functions of 
three variables being piecewise C^-smooth function with discontinuities on a piecewise 
C^-smooth surface. In some applications the C^-smoothness requirement is too strict, 
and we will, therefore, go one step further and consider a larger class of images also 
containing less regular images. The generalized class of cartoon-like images in 3D con- 
sidered in this paper consists of three-dimensional piecewise C^-smooth functions with 
discontinuities on a piecewise C" surface for a G (1,2]. Clearly, this model provides 
us with two new smoothness parameters: j3 being a classical smoothness parameter 
and a being an anisotropic smoothness parameter, see Figure 1.1 for an illustration. 
This image class is unfortunately not a linear space as traditional smoothness spaces, 
e.g.. Holder, Besov, or Sobolev spaces, but it allows one to study the quality of the 
performance of representation systems with respect to capturing anisotropic features, 
something that is not possible with traditional smoothness spaces. 

Finally, we mention that allowing piecewise C"-smoothness and not everywhere 
C"-smoothness is an essential way to model singularities along surfaces as well as 
along curves which we already described as the two fundamental types of anisotropic 
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Figure 1.1. The support of a 3D cartoon-like image f = foXB, where fo is smooth with 
supp /o = and the discontinuity surface dB is piecewise C° smooth. 



phenomena in 3D. 

1.3. Measure for Sparse Approximation and Optimality. The quahty of 
the performance of a representation system with respect to cartoon-Hke images is 
typically measured by taking a non-linear approximation viewpoint. More precisely, 
given a cartoon-like image and a representation system, the chosen measure is the 
asymptotic behavior of the error of iV-term (non-linear) approximations in the 
number of terms N. When the anisotropic smoothness a is bounded by the classical 
smoothness as a < |/3, the anisotropic smoothness of the cartoon-like images will be 
the determining factor for the optimal approximation error rate one can obtain. To 
be more precise, as we will show in Section 3, the optimal approximation rate for the 
generalized 3D cartoon-like images models / which can be achieved for a large class 
of adaptive and non-adaptive representation systems for l<a</3<2is 

\\.f - fN\\l2 <C ■ N-"/^ asN^oo, 

for some constant C > 0, where Jn is an A^-term approximation of /. For cartoon- like 
images, wavelet and Fourier methods will typically have an A^-term approximation 
error rate decaying as iV~^/^ and N~^^^ as N ^ oo, respectively, see [23]. Hence, as 
the anisotropic smoothness parameter a grows, the approximation quality of tradi- 
tional tools becomes increasingly inferior as they will deliver approximation error rates 
that are far from the optimal rate iV~"/^. Therefore, it is desirable and necessary to 
search for new representation systems that can provide us with representations with 
a more optimal rate. This is where pyramid-adapted, hybrid shearlet systems enter 
the scene. As we will see in Section 6, this type of representation system provides 
nearly optimally sparse approximations: 



" " \C-iV-i(log7V)2, if/3 = a = 2,J 

where f^ is the A^-term approximation obtained by keeping the N largest shearlet 
coefficients, and r = T{a) with < r < 0.04 and r — for a — > 1+ and for a 2~ . 
Clearly, the obtained sparse approximations for these shearlet systems are not truly 
optimal owing to the polynomial factor r for a < 2 and the polylog factor for a — 2. 
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On the other hand, it still shows that non-adaptive schemes such as the hybrid shearlet 
system can provide rates that are nearly optimal within a large class of adaptive and 
non-adaptive methods. 

1.4. Construction of 3D hybrid shearlets. Shearlet theory has become a 
central tool in analyzing and representing 2D data with anisotropic features. Shearlet 
systems are systems of functions generated by one single generator with parabolic 
scaling, shearing, and translation operators applied to it, in niTicli the same way 
wavelet systems are dyadic scalings and translations of a single function, but includ- 
ing a directionality characteristic owing to the additional shearing operation and the 
anisotropic scaling. Of the many directional representation systems proposed in the 
last decade, e.g., steerable pyramid transform [29], directional filter banks [3], 2D 
directional wavelets [2], curvelets [6], contourlets [13], bandelets [28], the shearlet sys- 
tem [25] is among the most versatile and successful. The reason for this being an 
extensive list of desirable properties: Shearlet systems can be generated by one func- 
tion, they precisely resolve wavefront sets, they allow compactly supported analyzing 
elements, they are associated with fast decomposition algorithms, and they provide 
a unified treatment of the continuum and the digital realm. We refer to [22] for a 
detailed review of the advantages and disadvantages of shearlet systems as opposed 
to other directional representation systems. 

Several constructions of discrete band-limited and compactly supported 2D shear- 
let frames are already known, see ]9, 11, 15, 20, 21, 26]; for construction of 3D shear- 
let frames less is known. Dahlke, Steidl, and Teschke ]10] recently generalized the 
shearlet group and the associated continuous shearlet transform to higher dimensions 
M". Furthermore, in ]10] they showed that, for certain band-limited generators, the 
continuous shearlet transform is able to identify hyperplane and tetrahedron singu- 
larities. Since this transform originates from a unitary group representation, it is not 
able to capture all directions, in particular, it will not capture the delta distribution 
on the .Ti-axis (and more generally, any singularity with ".Ti-directions"). We will 
use a different tiling of the frequency space, namely systems adapted to pyramids in 
frequency space, to avoid this non-uniformity of directions. We call these systems 
pyramid-adapted shearlet system ]22]. In ]16], the continuous version of the pyramid- 
adapted shearlet system was introduced, and it was shown that the location and the 
local orientation of the boundary set of certain three-dimensional solid regions can be 
precisely identified by this continuous shearlet transform. Finally, we will also need 
to use a different scaling than the one from ]10] in order to achieve shearlet systems 
that provide almost optimally sparse approximations. 

Since spatial localization of the analyzing elements of the encoding system is 
very important both for a precise detection of geometric features as well as for a 
fast decomposition algorithm, we will mainly follow the sufficient conditions for and 
construction of compactly supported cone-adapted 2D shearlets by Kittipoom and 
two of the authors ]20] and extend these result to the 3D setting (Section 4). These 
results provide us with a large class of separable, compactly supported shearlet systems 
with "good" frame bounds, optimally sparse approximation properties, and associated 
numerically stable algorithms. One important new aspect is that dilation will depend 
on the smoothness parameter a. This will provide us with hybrid shearlet systems 
ranging from classical parabolic based shearlet systems (a: = 2) to almost classical 
wavelet systems (aw 1). In other words, we obtain a parametrized family of shearlets 
with a smooth transition from (nearly) wavelets to shearlets. This will allow us 
to adjust our shearlet system according to the anisotropic smoothness of the data 
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at hand. For rational values of a we can associate this hybrid system with a fast 
decomposition algorithm using the fast Fourier transform with multipHcation and 
periodization in the frequency space (in place of convolution and down-sampling). 

Our compactly supported 3D hybrid shearlet elements (introduced in Section 4) 
will in the spatial domain be of size 2~^"/^ times 2~-^/^ times 2~^/^ for some fixed 
anisotropy parameter 1 < a < 2. When a « 1 this corresponds to "cube-like" (or 
"wavelet-like") elements. As a approaches 2 the scaling becomes less and less isotropic 
yielding "plate-like" elements as j — > oo. This indicates that these anisotropic 3D 
shearlet systems have been designed to efficiently capture two-dimensional anisotropic 
structures, but neglecting one-dimensional structures. Nonetheless, these 3D shearlet 
systems still perform optimally when representing and analyzing cartoon-like func- 
tions that have discontinuities on piecewise C"-smooth surfaces - as mentioned such 
functions model 3D data that contain both point, curve, and surface singularities. 

Let us end this subsection with a general thought on the construction of band- 
limited tight shearlet frames versus compactly supported shearlet frames. There seem 
to be a trade-off between compact support of the shearlet generators, tightness of the 
associated frame, and separability of the shearlet generators. The known construc- 
tions of tight shearlet frames, even in 2D, do not use separable generators, and these 
constructions can be shown to not be applicable to compactly supported genera- 
tors. Moreover, these tight frames use a modified version of the pyramid-adapted 
shearlet system in which not all elements are dilates, shears, and translations of a sin- 
gle function. Tightness is difficult to obtain while allowing for compactly supported 
generators, but we can gain separability as in Theorem 5.4 hence fast algorithmic 
realizations. On the other hand, when allowing non-compactly supported generators, 
tightness is possible, but separability seems to be out of reach, which makes fast 
algorithmic realizations very difficult. 

1.5. Other approaches for 3D data. Other directional representation systems 

have been considered for the 3D setting. We mention curvelets [4,5], surflets [8], and 
surfacelets [27]. This line of research is mostly concerned with constructions of such 
systems and not their sparse approximation properties with respect to cartoon-like 
images. In [8], however, the authors consider adaptive approximations of Horizon 
class function using surflet dictionaries which generalizes the wedgelet dictionary for 
2D signals to higher dimensions. 

During the final stages of this project, we realized that a similar almost optimal 
sparsity result for the 3D setting (for the model case a = (3 = 2) was reported by 
Guo and Labate [18] using band-limited shearlet tight frames. They provide a proof 
for the case where the discontinuity surface is (non-piecewise) C^-smooth using the 
X-ray transform. 

1.6. Outline. We give the precise definition of generalized cartoon-like image 
model class in Section 2, and the optimal rate of approximation within this model 
is then derived in Section 3. In Section 4 and Section 5 we construct the so-called 
pyramid-adapted shearlet frames with compactly supported generators. In Sections 6 
to 9 we then prove that such shearlet systems indeed deliver nearly optimal sparse 
approximations of three-dimensional cartoon-like images. We extend this result to 
the situation of discontimiity surfaces which are piecewise C"-smooth except for zero- 
and one-dimensional singularities and again derive essential optimal sparsity of the 
constructed shearlet frames in Section 10. We end the paper by discussion various 
possible extensions in Section 11. 
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1.7. Notation. We end this introduction by reviewing some basic definitions. 

The following definitions will mostly be used for the case n = 3, but they will however 
be defined for general n e N. For a; G M" we denote the j>norm on M" of x by 
The Lebesgue measure on M" is denoted by |-| and the counting measure by # H- 
Sets in M" are either considered equal if they arc equal up to sets of measure zero or if 
they are element-wise equal; it will always be clear from the context which definition 
is used. The i^-norm of / e iP(R") is denoted by ||/||^^. For / e ^^(K"), the 
Fourier transform is defined by 

m= [ /(x)e-2-<€'->dar 

with the usual extension to I/^(M"). The Sobolev space and norm are defined as 

F«(M") = |/: M" ^ C : ||/||^. := £^ (l + \eY\fiO\' < +oo| . 

For functions / : M" — ^ C the homogeneous Holder seminorm is given by 

11,11 \d\f{x)-d-yf{x')\ 

ll/llc^:= max sup , 

where {/3} = (3 — [/3\ is the fractional part of (3 and |7| is the usual length of a 
multi-index 7 = (71, 72, • • • , 7n)- Further, we let 

ll/llc, := max sup|5V| + ||/|lc^> 

7<L0J 

and we denote by C^{M.'^) the space of Holder functions, i.e., functions / : M" C, 

whose C^-norm is bounded. 

2. Generalized 3D cartoon-like image model class. The first complete 
model of 2D cartoon- like images was introduced in [7], the basic idea being that 

a closed C^-curve separates two C^-smooth functions. For 3D cartoon- like images we 
consider square integrable functions of three variables that are piecewise C^-smooth 
with discontinuities on a piecewise C"-smooth surface. 

Fix a > and (3 > 0, and let p : [0, 27r) x [0, tt] [0, 00) be continuous and define 

the set B in by 

-B = {a; e : ||a;||2 < p{9i,92),x = (WxW^ ,^1,^2) in spherical coordinates}. 

We require that the boundary dB of S is a closed surface parametrized by 

/p(0i,02)cos(0i)sin(02)\ 
b{9u02) = p{ei,92) sin(0i) sin(02) , ^ = (^i, ^2) e [0, 27r) x [0, tt] . (2.1) 
V p{9i,92)cos{92) J 

Furthermore, the radius function p must be Holder continuous with coefficient v, i.e., 
= ma^ sup '^"^f ~^Ly'^' < ^, P = P{0u92), P < Po < 1- (2.2) 

For V > 0, the set STAR°'{v) is defined to be the set of all B C [0, l]'"' such 
that B is a translate of a set obeying (2.1) and (2.2). The boundary of the sur- 
faces in STAR"{u) will be the discontinuity sets of our cartoon-like images. We 
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remark that any starshaped sets in [0, 1]^ with bounded principal curvatures will be- 
long to STAR^ii^) for some ly. Actually, the property that the sets in STAR"[v) are 
parametrized by spherical angles, which implies that the sets are starshaped, is not 
important to us. For a = 2 we could, e.g., extend STAR^{v) to be all bounded subset 
of [0, 1]^, whose boundary is a closed surface with principal curvatures bounded 
by V. 

To allow more general discontinuities surfaces, we extend STAR"{i>) to a class of 
sets B with piecewise C" boundaries dB. We denote this class STAR°'{h', L), where 
L e N is the number of C" pieces and > be an upper bound for the "curvature" 
on each piece. In other words, we say that B G STAR'^{v, L) if B is a bounded subset 
of [0, 1]^ whose boundary dB is a union of finitely many pieces dBi, . . . , dB^ which 
do not overlap except at their boundaries, and each patch dBj can be represented 
in parametric form pi = pi{0i,62) by a C"-smooth radius function with ||/0(j|(j.c< < i^- 
We remark that we put no restrictions on how the patches dBi meet, in particular, 
B G STAR°'{iy, L) can have arbitrarily sharp edges joining the pieces dBi. Also note 
that STAR°'{v) = STAR^iv, 1). 

The actual objects of interest to us are, as mentioned, not these starshaped sets, 
but functions that have the boundary dB as discontinuity surface. 

Definition 2.1. Let i/, > 0, a, /3 e (1,2], and L e N. Then denotes 
the set of functions / : — >■ C o/ the form 

f = fo + flXB, 

where B € STAR°'{v,L) and f, e C^{R^) with supp/o C [0,1]^ and \\fi\\cfi < IJ- for 
each z = 0, 1. We let £^{M?) := ff,i(M^). 

We speak of £^ ^{M?) as consisting of cartoon-like 3D images having -smo- 
othness apart from a piecewise C" discontinuity surface. We stress that £^j^{M?) is 
not a linear space of functions and that f^^(R^) depends on the constants v and /x 
even though we suppress this in the notation. Finally, we let denote binary 

cartoon-like images, that is, functions f = fo + fiXs G ^(M"^), where /o = and 
/i = 1- 

3. Optimality bound for sparse approximations. After having clarified the 
model situation £^j^{M.^), we will now discuss which measure for the accuracy of 
approximation by representation systems we choose, and what optimality means in 
this case. We will later in Section 6 restrict the parameter range in our model class 
£^ j^{M.^) to 1 < a < /3 < 2. In this section, however, we will find the theoretical 

optimal approximation error rate within ^(M^) for the full range 1 < a < 2 and 
/3 > 0. Before we state and prove the main optimal sparsity result of this section. 
Theorem 3.2, we discuss the notions of N-tevm approximations and frames. 

3.1. N-term approximations. Let $ = {4>i}i^j be a dictionary with the index 

set I not necessarily being countable. We seek to approximate each single element of 

ff^^(M3) with elements from $ by iV terms of this system. For this, let / e 

be arbitrarily chosen. Letting now N gN, we consider AT-term approximations of /, 

i.e., 



^ c,<j)i with 7;v C /, # |/jv| = N. 
ieiN 
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The best N-term approximation to / is an N-term approximation 



which satisfies that, for all In C I, # \In\ = and for all scalars {ci)i^i, 



ieiN 



L2 



3.2. Frames. A frame for a separable Hilbert space ?^ is a countable collection 
of vectors {/jjjeJ for which there are constants < A < _B < oo such that 

^||/f<EK/'/^)r^^ 11/11' forall/eW. 

If the upper bound in this inequality holds, then {fj}jei is said to be a Bessel sequence 
with Bessel constant B. For a Bessel sequence {/j}jej, we define the frame operator 
of {/jbej by 

S:H^H, Sf = J2{f,fj)fj. 

If is a frame, this operator is bounded, invertible, and positive. A frame 

{fj}j£3 is said to be tight if we can choose A = B. If furthermore A = B = 1, the 

sequence {fj}j(zj is said to be a Parseval frame. Two Bessel sequences {fj}jej and 
{gj}je^ are said to be dual frames if 

f = J2{f,9j)fj for all /e^. 

It can be shown that, in this case, both Bessel sequences are even frames, and we 
shall say that the frame {ffjljej is dual to {fj}jei- a-nd vice versa. At least one dual 
always exists; it is given by {S~^ fj}j^j and called the canonical dual. 

Now, suppose the dictionary $ forms a frame for Z/^(M *) with frame bounds A and 
B, and let {(pi}i^j denote the canonical dual frame. We then consider the expansion 
of / in terms of this dual frame, i.e., 



iei 

For any / e L^(M^) we have ((/, 4'i))iei S by definition. Since we only consider 
expansions of functions / belonging to a subset £^ j^{M.^) of L^(M^), this can, at least, 
potentially improve the decay rate of the coefficients so that they belong to £'''{1) for 
some p < 2. This is exactly what is understood by sparse approximation (also called 
compressible approximations). We hence aim to analyze shearlets with respect to this 
behavior, i.e., the decay rate of shearlet coefficients. 

For frames, tight and non-tight, it is not possible to derive a usable, explicit form 
for the best N-term approximation. We therefore crudely approximate the best N- 
term approximation by choosing the A'^-term approximation provided by the indices In 
associated with the A^ largest coefficients (/, in magnitude with these coefficients, 
i.e., 

ieiN 
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However, even with this rather crude greedy selection procedure, we obtain very strong 
results for the approximation rate of shearlets as we will sec in Section 6. 

The following well-known result shows how the iV-term approximation error can 
be bounded by the tail of the square of the coefficients Ci = (/, 4>i). We refer to [23] 
for a proof. 

Lemma 3.1. Let {(j)i}iei be a frame for H with frame bounds A and B, and let 
{<Pi}iei be the canonical dual frame. Let In C I with # = N, and let /jv be the 
N-term approximation f^ = {fj<Pi)'Pi- Then 

\\f-fNf<jJ2\if^^^)\' 

for any f G L^{R^). 

Let c* denote the non-increasing (in modulus) rearrangement of c = (cj)^^/ = 
((/:0i))ie/) e.g., c*„ denotes the nth largest coefficient of c in modulus. This rear- 
rangement corresponds to a bijection tt : N — )• / that satisfies 

TT : N ^ /, c^(„) = c*„ for all n e N. 

Since c e also c* G ^^(N). Let / be a cartoon-like image, and suppose that |c* |, 

in this case, even decays as 

141 for n^oo (3.1) 

for some a > 0, where the notation h{n) < g{n) means that there exists a C > 
such that h{n) < Cg{n), i.e., h{n) = 0{g{n)). Clearly, we then have c* G ^^(N) for 
p > By Lemma 3.1, the N-term approximation error will therefore decay as 

11/ - /ivf < ^ ^ < E ^ (3-2) 

n>N n>N 

where /jv is the A''-term approximation of / by keeping the N largest coefficients, that 
is, 

N 

fN ^^C*^^T,(n)- (3.3) 
n=l 

The notation h{n) x g{n), sometimes also written as h{n) = Q(g{n)), used above 
means that h is bounded both above and below by g asymptotically as n — > oo, 
that is, h{n) = 0{g{n)) and g{n) = 0{h{n)). The approximation error rate N~°'/'^ 
obtained in (3.2) is exactly the sought optimal rate mentioned in the introduction. 
This illustrates that the fraction introduced in the decay of the sequence c* will 
play a major role in the following. In particular, we are searching for a representation 
system $ which forms a frame and delivers decay of c = ((/, (t>i))iei as in (3.1) for any 
cartoon- like image. 

3.3. Optimal sparsity. In this subsection we will state and prove the main 
result of this section, Theorem 3.2, but let us first discuss some of its implications for 

sparse approximations of cartoon- like images. 

From the <& — {(pi}i,^i dictionary with the index set / not necessarily being 
countable, we consider expansions of the form 



ieif 



(3.4) 
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where // C / is a countable selection from I that may depend on /. Moreover, we can 
assume that (t>i are normalized by ||</>i||^2 = 1- The selection of the zth term is obtained 
according to a selection rule <j{i,f) which may adaptively depend on /. Actually, 
the ith element may also be modified adaptively and depend on the first {i — l)th 
chosen elements [14]. We assume that how deep or how far down in the indexed 
dictionary $ we are allowed to search for the next element (pi in the approximation 
is limited by a polynomial tt. Without such a depth search limit, one could choose $ 
to be a countable, dense subset of L^{M.^) which would yield arbitrarily good sparse 
approximations, but also infeasible approximations in practise. We shall denote any 
sequence of coefiicients Cj chosen according to these restrictions by c(/) = (c(/)i)i. 

We are now ready to state the main result of this section. Following Donoho [14] 
we say that a function class J" contains an embedded orthogonal hypercube of dimen- 
sion m and side S if there exists fo G T, and orthogonal functions V'i,m,55 * = 1, • • • , 
with ]]V'i,m,5ll2^2 = ^, such that the collection of hypercube vertices 

W(m;/o,{^J) := |/o + ^ ^iV'i.m,* : 6 e {0,1} | 

is contained in T. The sought bound on the optimal sparsity within the set of cartoon- 
like images will be obtained by showing that the cartoon-like image class contains 
sufficiently high-dimensional hypercubes with sufficiently large sidelength; intuitively, 
we will see that a certain high complexity of the set of cartoon-like images limits the 
possible sparsity level. The meaning of "sufficiently" is made precise by the following 
definition. We say that a function class T contains a copy of £q if T contains embedded 
orthogonal hypercubes of dimension m{S) and side 5, and if, for some sequence — 0, 
and some constant C > 0: 

m(4)><^^r' fc = fco,fco + !,.•• (3.5) 

The first part of the following result is an extension from the 2D to the 3D setting 
of [14, Thm. 3]. 
Theorem 3.2. 

(i) The class of binary cartoon-like images £'^™(M'^) contains a copy of Iq for 
p = 4/(a + 2). 

(ii) The space of Holder functions C^{M.^) with compact support in [0,lf con- 
tains a copy of Iq for p = 6/(2/3 -I- 3). 

Before providing a proof of the theorem, let us discuss some of its implications 
for sparse approximations of cartoon-Uke images. Theorem 3.2(i) implies, by [14, 
Theorem 2], that for every p < 4/(a -|- 2) and every method of atomic decomposition 
based on polynomial tt depth search from any countable dictionary we have for 

/ e fbi„(K3). 

min,, max ||c(/)]l^^p =-Foo, (3.6) 

<7(n,f)<w(n) f^£^ ^(R3) 

where the weak-£p "norm"^ is defined as ||c(/)||^£p = sup„>o n^^P \c*^\. Sparse approx- 
imations are approximations of the form ^ ■ c(/)i 0j with coefiicients c(/)* decaying 
at certain, hopefully high, rate. Equation (3.6) is a precise statement of the optimal 

^Note that neither IHIm^j, IMI^p (^o"^ p < 1) is a norm since they do not satisfy the triangle 
inequahty. Note also that the weak-^p norm is a special case of the Lorentz quasinorm. 
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achievable sparsity level. No representation system (up to the restrictions described 
above) can deliver expansions (3.4) for f^'"(M3) with coefficients satisfying c{f) e w£p 
for p < 4/(a + 2). As we will see in Theorems 6.1 and 6.2, pyramid-adapted shearlet 
frames deliver ((/, il>\))\ G wiP for p = 4/(a + 2 - 2t), where < r < 0.04. 

Assume for a moment that we have an "optimal" dictionary $ at hand that delivers 
c(/) G ■u;£'*/("+^) , and assume further that it is also a frame. As we saw in the 
Section 3.2, this implies that 

Il/-/iv||i2 <^-«/2 asTV^cx), 

where /jv is the A^-term approximation of / by keeping the A^ largest coefficients. 
Therefore, no frame representation system can deliver at better approximation error 
rate than 0(A^^"/^) under the chosen approximation procedure within the image 
model class f^™(M'^). If i> is actually an orthonormal basis, then this is truly the 
optimal rate since best N-tevm approximations, in this case, are obtained by keeping 
the N largest coefficients. 

Similarly, Theorem 3.2(ii) tells us that the optimal approximation error rate 
within the Holder function class is 0{N~'^^/^). Combining the two estimates we 
see that the optimal approximation error rate within the full cartoon-like image class 
£^{M?) cannot exceed o(iv- "^'"{"/2.2/3/3}) 

convergence. For the parameter range 
1 < a < p <2, this rate reduces to 0{N~"/'^). For a = /3 = 2, as will show in Sec- 
tion 6; shearlet systems actually deliver this rate except from an additional poly log 
factor, namely o{N-°'/'^{\ogNf) = 0{N-^{\ogNf ). For 1 < a < ^3 < 2 and a < 2, 
the log-factor is replaced by a small polynomial factor N'^^°'\ where T{a) < 0.04 and 
r(a) for a 1+ or q; 2^. 

It is striking that one is able to obtain such a near optimal approximation error 
rate since the shearlet system as well as the approximation procedure will be non- 
adaptive; in particular, since traditional, non-aclaptive representation systems such as 
Fourier series and wavelet systems are far from providing an almost optimal approxi- 
mation rate. This is illustrated in the following example. 

Example 1. Let B = B{x,p) be the ball in [0,1]"^ with center x and radius r. 
Define f = xb- Clearly, f e ^KM^) if B c [0,lf. Suppose $ = {e^'^^'^^j^ezd. The 
best N-term Fourier sum fjy yields 

Il/-/jv|li2 X A^-i/3 /orA^oo, 

which is far from the optimal rate . For the wavelet case the situation is only 
slightly better. Suppose $ is any compactly supported wavelet basis. Then 

ll/-/iv|lL2xA-i/2 forN^oo, 

where /jv is the best N-term approximation from The calculations leading to these 
estimates are not difficult, and we refer to [23] for the details. We will later see 
that shearlet frames yield ||/ — /jv|li2 < N~^{\ogN)'^, where f^ is the best N-term 
approximation. 

We mention that the rates obtained in Example 1 are typical in the sense that 
most cartoon-like images will yield the exact same (and far from optimal) rates. 
Finally, we end the subsection with a proof of Theorem 3.2. 

Proof. [Proof of Theorem 3.2] The idea behind the proofs is to construct a col- 
lection of functions in £'^'"(M'^) and C'^(M'^), respectively, such that the collection of 
functions will be vertices of a hypercube with dimension satisfying (3.5). 
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(i): Let ipi and (f2 be smooth C°° functions with compact support suppyi C 
[0, 27r] and supp (^2 C [0, n]. For ^4 > and m € N we define: 

<fi,m{t) = Viui2,m{t) = Am~"(pi{mti - 27rii)iy?2(mf2 - 7rz2), 

for ii,i2 S {0, ...,m— 1}, where i = (ii,«2) and t = {t\,t2)- We let further ip{t) := 
<fi{ti)(p2{t2)- It is easy to see that ||</'i,m||j;,i = m~°'~^'^A Moreover, it can also 

be shown that Hf^Si^mH^^c = ^||<^||(ja, where ||-||^a denotes the homogeneous Holder 
norm introduced in (2.2). 

Without loss of generality, we can consider the cartoon- like images ^^'"(M^) trans- 
lated by — (i, i, i) so that their support lies in [—1/2, 1/2]^. Alternatively, we can 
fix an origin at (1/2, 1/2, 1/2), and use spherical coordinates {p, ^1,^2) relative to this 
choice of origin. We set = 1/4 and define 

V'i.m = X{po<p<Po+Vi,^} for zi, 12 e {0, . . . , m - 1}. 
The radius functions p^ for 7 = (7ii,i2)ii,i2e{o,...,m-i} with 7ii,i2 

e {0, 1} defined by 



Pji0i,92)=po+Yl 

Til, 82 'fii,m (01,02), (3.7) 

ii=l j2=l 

determines the discontinuity surfaces of the functions of the form: 

m m 

fj = X{p<po} + X] X] ^ii,i2'>Pi,rn for 7ii,i2 ^ {0, 1}. 

For a fixed m the functions i/'i.m are disjointly supported and therefore mutually or- 
thogonal. Hence, 'H{m'^ , X{p<po} , {4'i,m}) is a collection of hypercube vertices. More- 
over, 

i,m\\ 1,2 — X{{{p,0l,e2) ■■Po<P<Po + <Pi,m{0l,02)}) 
1-2-K i^TT rPo+'Pi,m{6ifi2) 

< / / p'^ sin 62 dp (102 de-i 

Jo Jo Jpo 

where the constant Cq only depends on A. Any radius function p = ^(^1,^2) of the 
form (3.7) satisfies 

WPjWcc < MmWc'o, = ^ll'f'llc^ • 

Therefore, < i' whenever A < i// \\f\\ca- This shows that we have the hyper- 

cube embedding 

H(m2,X{p<M'Wi.-})c fa 
The side length S = ||V'i,m||j;,2 of the hypercube satisfies 

<5^<Com-"-|M|,.<.^m-«-^ 
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whenever Co < • Now, we finally choose m and A as 

and A{5,iy)=5'^m"+'^/Mj^,. 

By this choice, we have Cq < i^/ ll'/^llfja for sufficiently small 6. Hence, 7^ is a hyper- 
cube of side length 5 and dimension d = m{S)^ embedded in f^™(M'^). We obviously 
have m{S) > Cii'"+^6~^+^, thus the dimension d of the hypercube obeys 

for all sufficiently small S > 0. 

(ii): Let (p G C^(M) with compact support suppi^ C [0,1]. For m S N to be 
determined, we define for ii, 12,13 € {0, . . . , m — 1}: 

i'i,m{t) = ipiiA2.i3:mit) = m~^Lp{mti - ii)ip{mt2 - i2)p{mt3 - is), 

where i = {11,12,13} and t = (^1,^2,^3)- We let tp{t) := </?(ti)v(^2)v(^3)- It is easy to 
see that ||'!/'i,m|li2 = m~^''~^ llV'lli,2- We note that the functions Vi,m are disjointly 
supported (for a fixed m) and therefore mutually orthogonal. Thus we have the 
hypercube embedding 

where the side length of the hypercube is 5 = ||V'i,m|li;,2 = m~^~^/^ IIV'lli,2- Now, 
chose m as 




-l/(/3+3/2) 



Hence, 'H is a hypercube of side length S and dimension d = Tn{S)^ embedded in 
C^{M.^). The dimension d of the hypercube obeys 

d > Cd~^7^ = C6~^, 

for all sufficiently small 5 > 0. □ 

3.4. Higher dimensions. Our main focus is. as incutioned above, the three- 
dimensional setting, but let us briefly sketch how the optimal sparsity result extends 
to higher dimensions. The d-dimensional cartoon-like image class £f (M'*) consists of 
functions having C'^-smoothness apart from a (rf— l)-dimensional C"-sinootli discon- 
tinuity surface. The d-dimensional analogue of Theorem 3.2 is then straightforward 
to prove. 

Theorem 3.3. 

(i) The class of d- dimensional binary cartoon-like images f^*"(]R.'^) contains a 
copy of£^ forp = 2{d-l)/{a + d~l). 

(ii) The space of Holder functions C'^{R'^) contains a copy of 1^ for p = j^^- 
It is then intriguing to analyze the behavior ofp = 2{d ^ l)/{a + d — 1) and 

p — 2d/{2P + d). from Theorem 3.3. In fact, as d — > 00, we observe that p — > 2 in 
both cases. Thus, the decay of any c(/) for cartoon-like images becomes slower as d 
grows and approaches £^ which is actually the rate guaranteed for all f £ X^(R'^). 



m{6) = 



52 



-l/(a+2) 
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Moreover, by Theorem 3.3 we see that the optimal approximation error rate for iV- 
term approximations fj^ within the class of d-dimensional cartoon-like images £^ (W^) 
is 7V~"""{"/('*-i)^2^/<i}^ this paper we will however restrict ourselves to the case 
d ~ 3 since we, as mentioned in the introduction, can see this dimension as a critical 
one. 



4. Hybrid shear lets in 3D. After we have set our benchmark for directional 
representation systems in the sense of stating an optimality criteria for sparse ap- 
proximations of the cartoon- like image class we next introduce the class of 
shearlet systems we claim behave optimally. 

4.1. Pyramid-adapted shearlet systems. Fix a S (1,2]. We scale according 
to scaling matrices , or , j G Z, and represent directionality by the shear 
matrices Sk, Sk, or Sk, k = (fci,/c2) G 2^, defined by 



A2, 



^2i"/2 
2^/2 Q 
2^/2, 



A2r- 



^2^/2 
2^"/2 
2^/2^ 



and A2j : 



^2J/2 
2^/2 
2J"/2y 



and 



Sk 



A fcl 
1 




fc2 





fcl 







1 







fc2 



and Sk 














1 





fcl 


fc2 





respectively. The case a = 2 corresponds to paraboloidal scaling. As a decreases, the 
scaling becomes less anisotropic, and allowing a — 1 would yield isotropic scaling. The 
action of isotropic scaling and shearing is illustrated in Figure 4.1. The translation 




Figure 4.1. Sketch of the action of scaling (a ^ 2) and shearing. For ip £ L^(K^) with 
suppi/) C [0,1]^ we plot the support of ip(SkAj-) for fixed j > and various k = {ki,k2) (L 1? ■ From 
left to right: k\ = k2 = 0, fci = 0,k2 < 0, and ki < 0, fc2 = 0. 



lattices will be generated by the following matrices: Mc = diag(ci, C2, C2), Mc = 
diag(c2, ci, C2), and Mc = diag(c2, C2, ci), where ci > and C2 > 0. 
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Figure 4.2. Sketch of the partition of the frequency domain. The centered cube C is shown, 
and the arrangement of the six pyramids is indicated by the "diagonal" lines. We refer to Figure 4-3 
for a sketch of the pyramids. 



We next partition the frequency domain into the following six pyramids: 



' {(6,6,6) eM 


' : 6 > 1, 16/6 


< 1, 16/6 


<i} 


L = 


1, 


{(6,6,6) 


^ : 6 > 1, 16/6 


< 1, 16/6 


<i} 




2, 


{(6,6,6) eM 


' : 6 > 1, 16/6 


< 1, 16/6 


<i} 


L = 


3, 


{(6,6,6) 


: 6 < -1, 16/6 


< 1, 16/6 


<i} 


i = 


4, 


{(6,6,6) eK' 


: 6 < -1, 16/6 


< 1, 16/6 


<i} 




5, 


. {(6,6,6) 


:6<-i, 16/61 


< 1, 16/61 


<i} 




6, 



and a centered cube 

c = {(6,6,6)eR3:||(^,,^2,^3)||^<i}. 

The partition is illustrated in Figures 4.2 and 4.3. This partition of the frequency 
space into pyramids allows us to restrict the range of the shear parameters. In case 
of the shearlet group systems, one must allow arbitrarily large shear parameters. 
For the pyramid-adapted systems, we can, however, restrict the shear parameters 
to [- [2J("-i)/2], [2J'("-i)/2]]. We would like to emphasize that this approach is 
important for providing an almost uniform treatment of different directions - in a 
sense of a good approximation to rotation. 

These considerations are made precise in the following definition. 

Definition 4.1. For a € (1,2] and c = (ci,C2) € (IR+)^, the pyramid-adapted, 
hybrid shearlet system SH{(j), tp, tp, tp; c, a) generated by 4>, tp,tp,tp £ L'^{R^) is defined 
by 



SH{(f>, ip, ip, ip; c, a) = $(0; Ci) U ^{ip; c, a) U ^{tp; c, a) U ^{ip; c, a), 
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(a) Pyramids Vi and (b) Pyramids 7^2 and (c) Pyramids and 
Vi and the axis. Vs and the ^2 axis. Va and the §3 axis. 

Figure 4.3. The partition of the frequency domain: The "top" of the six pyramids. 

where 

$(0; ci) = {(j)m = H- -m):me CiI?) , 
*(V;c,a) = [ij,^k.,n = V'^^iSkA^. ■ -m) : j > 0, < r2^'(°-i)/2] , m e M,Z^} , 
§(Vi;c,a) = = V^^iSkA^, ■ -m) : j > 0, |fc| < \2^^"~^y^^,m e Af.Z^}, 

and 

^{^j;c,a) = {4fe^„, = 2^^i,{SkA^, ■ -m) : j > 0, |fc| < \2^^''-^'>/^^,m e Af^Z^}, 

where j G Nq and k E 1? . Here we have used the vector notation \k\ < K for 
k = (fci,fc2) and K > to denote \ki\ < K and \k2\ < K. We will often use ^'('0) 
as shorthand notation for '^{'ip;c, a). If SH{(j),ip,ip,il};c,a) is a frame for L'^{M?), we 
refer to (j) as a scaling function and ip, and tp as shearlets. Moreover, we often 
simply term SH(4>, ■(/;, ^, ip; c, a) pyramid-adapted shearlet system. 

We let P 7^1 U P4, :P = 7^2 U 7^5, and V = V3U'Pe- In the remainder of this 
paper, we shall mostly consider V; the analysis for V and V is similar (simply append 
~ and respectively, to suitable symbols). 

We will often assume the shearlets to be compactly supported in spatial domain. 
If e.g., suppV^ C [0,1]^, then the shearlet element i/'i.fc.m will be supported in a 
parallelepiped with side lengths 2^^"/^, 2^^/^, and 2~^/'^, see Figure 4.1. For a = 
2 this shows that the shearlet elements will become plate-like as j — 00. As a 
approaches 1 the scaling becomes almost isotropic giving almost isotropic cube-like 
elements. The key fact to mind is, however, that our shearlet elements always become 
plate-like as j — > cxi with aspect ratio depending on a. 

In general, however, we will have very weak requirements on the shearlet gen- 
erators tp, tp, and tp. As a typical minimal requirement in our construction and 
approximation results we will require the shearlet ip to be feasible. 

Definition 4.2. Let 5, 7 > 0. A function ip G i^(M^) is called a ((5, 7) -feasible 
shearlet associated with V , if there exist q>q'>0, q>r>0, q>s>0 such that 

\m\ < min{l, \q^,f} min{l, Ig'^if^} min{l, \rW'} min{l, H^r}, (4.1) 

for all ^ = ($1,^2,^3) G ■ For the sake of brevity, we will often simply say that ip is 
{S, 7) -feasible. 

Let us briefly comment on the decay assumptions in (4.1). If ip is compactly 
supported, then ^p will be a continuous function satisfying the decay assumptions 



APPROXIMATIONS OF 3D FUNCTIONS BY COMPACTLY SUPPORTED SHEARLETS 17 



as 1^1 — >■ 00 for sufficiently small 7 > 0. The decay condition controlled by S can 

be seen as a vanishing nionicnt condition in the xi-direction which suggests that a 
((5, 7)-feasibIc shcarlet will behave as a wavelet in the .xi-dircction. 

5. Construction of compactly supported shearlets. In the following sub- 
section we will describe the construction of pyramid-adapted shearlet systems with 

compactly supported generators. This construction uses ideas from the classical con- 
struction of wavelet frames in [12, §3.3.2]; we also refer to the recent construction of 
cone-adapted shearlet systems in L^(]R^) described in the paper [20]. 

5.1. Covering properties. We fix a e (1,2], and let ip G i^(M^) be a feasible 
shearlet associated with V. We then define the function $ : "P x — )• M by 

^(^'^)=E E |^(5TfcA2-.0||^(5-fc^2-.C + a;)|. (5.1) 

j>a fe<r2J(°-i)/2] 

This function measures to which extent the effective part of the supports of the scaled 
and sheared versions of the shearlet generator overlaps. Moreover, it is linked to the 
so-called iq-equations albeit with absolute value of the functions in the sum (5.1). We 
also introduce the function F : M defined by 

r(w) = esssup$(^, w), 

measuring the maximal extent to which these scaled and sheared versions overlap for 
a given distance w e M"^ . The values 

Li„f = essinf $(^, 0) and = ess sup <&(^, 0), (5.2) 

will relate to the classical discrete Calderon condition. Finally, the value 

R{c)= [^(^c"^m)^(-M-lm)]^^^ where c = (ci, ca) e K^, (5.3) 

meZ3\{0} 

measures the average of the symmetrized function values T{M~^m) and is again 
related to the so-called tg-equations. 

We now first turn our attention to the terms isup and R{c) and provide upper 
bounds for those. These estimates will later be used for estimates for frame bounds 
associated to a shearlet system; we remark that the to be derived estimates (5.5) and 
(5.7) also hold when the essential supremuni in the definition of Lsup 

and i?(c) is 

taken over all ^ e M^. 

To estimate the effect of shearing, we will repeatedly use the following estimates: 

sup Vmin{l,|i/|}min|l,|a;-f-fc2/|"^| < 3-h =: C(7) (5.4) 

and 

sup Vmin{l,|j/|}min|l,|x-hA;yr^| < 2 -h = C(7) - 1 
(x,y)eM2^ ^ J 7-1 

for 7 > 1. 

Proposition 5.1. Suppose ip e L^(K^) is a {6,^) -feasible shearlet with 6 > I 
and 7 > 1/2. Then 

i....<gc(27)-( ^_,,i,,,,„ + [^iog.a; 



< oo, (5.5) 
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where (7(7) = 3 + . 

Proof. By (4.1), we immediately have the following bound for $(^,0): 

$(5,0) < sup Vmin{l,|g2-^"/2^i|2^} min{l, |g'2-J"/2^i|-2T} 

• min{l,|r(2-^/2^2 + /ei2-^«/2^i)|-2T} 

• ^ min{l, |s(2-^/2^3 + fc22-^"/26)|-2'^}. 

Letting 771 = g^i and using that q> r and q> s, we obtain 

$($,0)< sup Vmin{l,|2-^"/277i|'^"'}min{l,|g'g-i2-J"/277i|-27} 
(m,«2,l3)eK3^ 

• ^ ^ min{l, |rg-i2-^"/27?i|} min{l, |r2-J/2^2 + kirq-^2-^"/^r]i\-^''} 

fciGZ ^ 

• ^ - min{l, \sq-^2-^"/^T]i\} min{l, |s2-^/2^3 + k2sq-^2-^'^/^rii\-^''}. (5.6) 

fe2ez 

By (5.4), the sum over /ci G Z in (5.6) is bounded by ^C{2j). Similarly, the sum over 
^2 e Z in (5.6) is bounded by ^C{2j). Hence, we can continue (5.6) by 

2 

$(^,0) < — C(27)2 sup y min{l, |2-J"/277i |'^"'} min{l, \q'q-^2-^"/^m\~^''} 
rs meR^^ 

= fc{2^r sup (^|2-W2^if'-\[o,i)(|2-^"/Vl) +X[i,,/«')(|2-^"/'^i|) 

\q'q-'2-^-/'mr\w/,',^){\2-^"/'m\)) 
< ^-C{2^r sup( y |2-W2^i|^'-%y X[i.4)(|2-^"/Vl) 

+ y |gV'2-W2^i|-2T). 

|g'q-i2-3°/2r,i|>l 

The claim (5.5) now follows from (A.l), (A.2) and (A.3). □ 

The next result. Proposition 5.2, exhibits how R{c) depends on the parameters ci 
and C2 from the translation matrix M^. In particular, we see that the size of R{c) can 
be controlled by choosing ci and C2 small. The result can be simplified as follows: For 
any 7' satisfying 1 < 7' < 7 — 2, there exist positive constants ki and K2 independent 
on ci and C2 such that 

i?(c) < Ki c7 + K2 C2~^ ■ 

The constants ki and K2 depends on the parameters q, q' , r, s, S and 7, and the result 
below shows exactly how this dependence is. 

Proposition 5.2. Let ip e L^(R^) be a {5, -feasible shearlet for S > 2j > 6, 
and let the translation lattice parameters c = (01,02) satisfy ci > C2 > 0. Then, for 
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any 7' satisfying 1 < 



< 



- 2. 



have 



R{c) < ri(8C(7 - 2) - 4C(7 - 1) + 2C(7)) 

+ 3min| I ,2|r2(l6C(7-2)-4C(7-l))+T3(24C(7-2) + 2C(7)), (5.7) 



where 



rs 

r,2 



+ 



1 



1 _ 2-*+27 



+ 



1-2-7; 



rs \ q' min{r, s} 



+ 



1 _ 2-^+27 
1 



1 



1 — 2-7 1 — 2-''+7+7' 1 — 2-7' 



and ( is the Riemann zeta function. 

Proof. The proof can be found the Appendix B. □ 

The tightness of the estimates of R{c) in Proposition 5.2 are important for the 
construction of shearlet frames in the next section since the estimated frame bounds 
will depend heavily on the estimate of R{c). If we allowed a cruder estimate of R{c), 
the proof of Proposition 5.2 could be considerably simplified; as we do not allow this, 
the slightly technical proof is relegated to the appendix. 

5.2. Frame constructions. The results in this section (except Corollary 5.6) 
axe presented without proofs since these are straightforward generalizations of results 
on cone-adapted shearlet frames for L^(IR^) from [20]. We first formulate a general 
sufficient condition for the existence of pyramid-adapted shearlet frames. 

Theorem 5.3. Let € L^(M^) he a {5, ^)-feasihle shearlet (associated with V ) for 
(5 > 27 > 6, and let the translation lattice parameters c = (ci,C2) satisfy Ci > C2 > 0. 
If R{c) < L,nf, then *(■)/') is a frame for L'^{V) := {/ G L'^{M.^) : supp/ C V} with 
frame bounds A and B satisfying 



1 



I det Mr 



-[L,nf-R{c)] <A<B< 



1 



detM^ 



[Lsup + R{c) 



Let us comment on the sufiicient condition for the existence of shearlet frames in 
Theorem 5.3. Firstly, to obtain a lower frame bound A, we choose a shearlet generator 
ip such that 

P C U U A2,S^n, (5.8) 

where 

For instance, one can choose = [1,2] x [—1/2,1/2] x [—1/2,1/2] here. From (5.8), 
we have L^f > p^. Secondly, note that R{c) — ^ as ci — )• 0+ and C2 0+ by 
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Proposition 5.2 (see Ti,T2, and T3 in (5.7)). In particular, for a given Linf > 0, one 
can make R{c) sufficiently small for some translation lattice parameter c = (ci, C2) so 
that Linf — R{c) > 0. Finally, Proposition 5.1 and 5.2 imply the existence of an upper 
frame bound B. We refer to [23] for concrete examples with frame bound estimates. 

By the following result we then have an explicitly given family of shearlets satis- 
fying the assumptions of Theorem 5.3 at disposal. 

Theorem 5.4. Let K,L gN be such that L > 10 and < K < 3L - 2, and 
define a shearlet ip G L^{M?) by 

m = mi(4a)^(a)^(26).^(26), ^ = (a, 6, 6) e K^ 
where mo is the low pass filter satisfying 

|mo(a)P = (cos(7ra))'^ E ~ ^ ^ 6 & 

n=0 V / 

mi is the associated bandpass filter defined by 

|mi(a)|' = |mo(6 + 1/2)1', 6 em, 
and (f) is the scaling function given by 

DC 

i=o 

Then there exists a sampling constant Ci > such that the shearlet system \E'(V') 
forms a frame for L-^iV) for any sampling matrix Mc with c — (ci, C2) G (R+)' and 
C2 < ci < ci . Furthermore, the corresponding frame bounds A and B satisfy 

^ - [Linf - R{c)\ <A<B< ] [Lsup + R{c)] , 



|det(Mc)|' '"^ - - |det(Mc 

where R{c) < L^^f- 

Theorem 5.4 provides us with a family of compactly supported shearlet frames 
for L^{V). For these shearlet systems there is a bias towards the xi axis, especially at 
coarse scales, since they are defined for L'^{P), and hence, the frequency support of 
the shearlet elements overlaps more significantly along the xi axis. In order to control 
the upper frame bound, it is therefore desirable to have a denser translation lattice 
in the direction of the Xi axis than in the other axis directions, i.e., Ci > €2- 

In the next result we extend the construction from Theorem 5.4 for L^iV) to all of 
L^(IR'^). We remark that this type of extension result differs from the similar extension 
for band-limited (tight) shearlet frames since in the latter extension procedure one 
needs to introduce artificial projections of the frame elements onto the pyramids in 
the Fourier domain. 

Theorem 5.5. Let tp S L^(]R'^) be the shearlet with associated scaling func- 
tion (p G L^(IR) introduced in Theorem 5.4, and set (/)(.ti, X2, X3) = (j){xi)(j){x2)(l){x3), 
'ip{xi,X2,X3) = ip{x2,xi,X3), and ^{xi, X2, X3) = xp{x3,X2,xi). Then the correspond- 
ing shearlet system SH(^(t>,'ijj,ijj,ijj'.c,a) forms a frame for L^(M'^) for the sampling 
matrices M^, M^, and Mc with c = (ci,C2) € (1^+)' and C2 < ci < ci. 

For the pyramid V, we allow for a denser translation lattice Mcl? along the 
x\ axis, i.e., C2 < Ci, precisely as in Theorem 5.4. For the other pyramids V and 
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V, we analogously allow for a denser translation lattice along the X2 and X3 axes, 
respectively; since the position of C\ and C2 in and are changed accordingly, 
this still corresponds to C2 < Ci. 

The final result of this section generalizes Theorem 5.5 in the sense that it shows 
that not only the shcarlet introduced in Theorem 5.4, but also any ((5, 7)-feasible 
shearlet satisfying (5.8) generates a shearlet frame for L^(M'^) provided that 8 > 
27 > 6. For this, we change the definition of R{c), Li^f and L^^p in (5.2) and (5.3) so 
that the essential infimum and suprcniuin arc taken over all of M"^ and not only over 
the pyramid V, and we denote these new constants again by R{c), Linf and Lsup- 

Corollary 5.6. Let ip e L^(K^) be a (6,^) -feasible shearlet for ^ > 27 > 6. 
Also, define 'ijj and ip as in Theorem 5.5 and choose (j) £ L^(M^) such that \4>{C)\ ^ 
(1 + ICI) Suppose that Linf > 0- Then SH{(l),'ip,'ip,ilj; c, a) forms a frame for L'^{M.^) 
for the sampling matrices M^, M^., and Mc for some translation lattice parameter 

C = (C1,C2). 

Proof. The proofs of Proposition 5.1 and 5.2 show that the same estimate as in 
(5.5) and (5.7) holds for our new R{c) and Ls„p; this is easily seen since the very first 
estimate in both these proofs extends the supremum from "P to R^. Furthermore, by 
Proposition 5.2, one can choose c = (ci, C2) such that Linf ~ Ll{c) > 0. Now, we have 
that Lsup + R{c) is bounded and Linf — R{c) > 0. Since R{c) and Lgup are associated 
to the tg-terms and a discrete Calderon condition, respectively, following arguments 
as in [12, §3.3.2] or [20] show that frame bounds A and B exist and that 

< (i?(c) - Linf)/ det Mc <A<B< {R{c) + Lsup)/detMc < 00. 

□ 

6. Optimal sparsity of 3D shearlets. Having 3D shearlet frames with com- 
pactly supported generators at hand by Theorem 5.5, we turn to sparse approximation 
of cartoon- like images by these shearlet systems. 

6.1. Sparse approximations of 3D Data. Suppose SH{(j), ijj, ip, c, a) forms 
a frame for L^(]R^) with frame bounds A and B. Since the shearlet system is a 

countable set of functions, we can denote it by SH((l),'ip,'ip,'ijj:c,a) = {cijjg/ for 
some countable index set /. We let {ajjie/ be the canonical dual frame of {(7i}i^j. 
As our N-term approximation /jv of a cartoon- like image / e £^{M.^) by the frame 
SH{^, ip, Ip; c), we then take, as in Equation (3.3), 

/jv = ^ Cj Cj = (/, cTj) , 

ie/iv 

where {{f, (yi))iei^- are the N largest coefficients {f,(Ji) in magnitude. 

The benchmark for optimal sparse approximations that we are aiming for is, as 
we showed in Section 3, for all / = /o + Xb/i € f£(M^), 

Il/-/iv|li2 <iV-«/2 asiV^oo, 

and 

asn^oo, 

where c* = (c* )„eN is a decreasing (in modulus) rearrangement of c = (ci)ig/. The 
following result shows that compactly supported pyramid-adapted, hybrid shearlets 
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almost deliver this approximation rate for all 1 < a < ^ < 2. We remind the reader 

that the parameters v and /x, suppressed in our notation £^{^\ are bounds of the 
homogeneous Holder (7" norm of the radius function for the discontinuity surface dB 
and of the norms of /o and /i , respectively. 

Theorem 6.1. Let a. e (1, 2], c e and let ij.), ij,tp e i^(M^) be compactly 

supported. Suppose that, for all ^ = (^17^2)^3) S K'^, the function ip satisfies: 



(i) 
(ii) 



a 



m)\ < C ■ min{l, ICiin • min{l, l^im ' min{l, • min{l, j^m. 

:m\ < mi)\ • (1 + i^) (1 + M) ' ' = 2' 3' 

where S > 8, 'y > 4, h G i^(ffi), and C a constant, and suppose that tp and 'ij) satisfy 
analogous conditions with the obvious change of coordinates. Further, suppose that 
the shearlet system SH{(j), ip, ip, ip; c, a) forms a frame for 
Let T = r(a) be given by 

3(2-a)(a-l)(a + 2) 
"("^ = 2(9a2 + i7a-10) ' ^^''^ 

and let P € [a, 2]. Then, for any v,iJ, > 0, the shearlet frame SH{(j),ijj,'il),'ip;c,a) 
provides nearly optimally sparse approximations of functions f G £^{M.^) in the sense 
that 

\\f-fN\\L2<{ , ' ^ as N ^00, 6.2 

where /jv is the N-term approximation obtained by choosing the N largest shearlet 
coefficients of f, and 




i//3e [a,2), 
i//3 = a = 2, 



as n ^ 00, (6.3) 



where c = {{fjijjx) : X£A,ip = tjj,tjj = tl; or tp} and c* = (c*)„eN is a decreasing (in 
modulus) rearrangement of c. 

We postpone the proof of Theorem 6.1 Tintil Section 9. The sought optimal 
approximation error rate in (6.2) was N~"/'^, hence for a = 2 the obtained rate (6.2) 
is almost optimal in the sense that it is only a polylog factor {logN)"^ away from the 
optimal rate. However, for a E (1:2) we are a power of N with exponent t away 
from the optimal rate. The exponent r is close to negligible; in particular, we have 
that < T(a) < 0.04 for a G (1, 2) and that T{a) for a ^ 1+ or a ^ 2-, see 
also Figure 6.1. The approximation error rate (6.2) obtained for a < 2 can also be 
expressed as 

o 6c«3 + 7c«2-llc« + 6 

11/ - /Jv|li2 = OiN- -^ + 17.-10 ), 

which, of course, still is an t = T{a) exponent away from being optimal. Let us 
mention that a slightly better estimate T{a) can be obtained satisfying T(a) < 0.037 
for a € (1)2), but the expression becomes overly complicated; we can, however, with 
the current proof of Theorem 6.1 not make T(a) arbitrarily small. As a — >^ 2+ we 
see that the exponent — a/2 + r —1, however, for a = B = 2 an additional log 
factor appears in the approximation error rate. This jump in the error rate is a 
consequence of our proof technique, and it might be that a truly optimal decay rate 
depends continuously on the model parameters. 
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If the smoothness of the discontinuity surface C" of a 3D cartoon-Uke image 
approaches smoothness, we loose so much directional information that we do not 
gain anything by using a directional representation system, and we might as well 
use a standard wavelet system, see Example 1 and Figure 6.1(a). However, as the 
discontinuity surface becomes smoother, that is, as a approaches 2, we acquire enough 
directional information about the singularity for directional representation systems to 
become a better choice; exactly how one should adapt the directional representation 
system to the smoothness of the singular is seen from the the definition of our hybrid 
shearlet system. 

The constants in the expressions in (6.2) depend only on u and /x, where is a 
bound of the homogeneous Holder norm for the radius function p G C" associated with 
the discontinuity surface dB and /i is the bound of the Holder norm of /i , /2 € (M'"* ) 
with / — /o + Xb/i, see also Definition 2.1. We remark that these constants grow 
with u and /U hence we cannot allow / = /o + Xs/i with only ||/i||f7/3 < oo. 




1 1.2 1.4 1.6 1.8 2 



a 

(a) Graph of and the optimal rate a/2 (dashed) as a function of a. 
0.04 ; , _ , , , , 




1 1.2 1.4 1.6 1.8 2 



a 

(b) Graph of T{a) given by (6.1). 

Figure 6.1. The optimality gap for P 6 [a, 2): Figure 6.1a shows the optimal and the obtained 
rate, and Figure 6.1b their difference T(a). 

Let US also briefly discuss the two decay assumptions in the frequency domain on 
the shearlet generators in Theorem 6.1. Condition (i) says that i/i is ((5, 7)-feasible and 
can be interpreted as both a condition ensuring almost separable behavior and con- 
trolling the effective support of the shearlets in frequency domain as well as a moment 
condition along the xi axis, hence enforcing directional selectivity. Condition (ii), to- 
gether with (i), is a weak version of a directional vanishing moment condition (see [13] 
for a precise deflnition), which is crucial for having fast decay of the shearlet coefh- 
cients when the corresponding shcarkrt intc^rsccts the discontinuity surface. We refer 
to the exposition [23] for a detailed explanation of the necessity of conditions (i) and 
(ii). Conditions (i) and (ii) are rather mild conditions on the generators; in particular, 
shearlets constructed by Theorem 5.4 and 5.5, with extra assumptions on the param- 
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eters K and L, will indeed satisfy (i) and (ii) in Theorem 6.1. To compare with the 

optimality result for band-limited generators we wish to point out that conditions (i) 
and (ii) are obviously satisfied for band-limited generators. 

Theorem 1.3 in [24] shows optimal sparse approximation of compactly supported 
shearlets in 2D. Theorem 6.1 is similar in spirit to Theorem 1.3 in [24], but for 
the three-dimensional setting. However, as opposed to the two-dimensional setting, 
anisotropic structures in three-dimensional data comprise of two morphological differ- 
ent types of structures, namely surfaces and curves. It would therefore be desirable 
to have a similar optimality result for our extended 3D image class £^l{^^) which 
also allows types of curve-like singularities. Yet, the pyramid-adapted shearlets in- 
troduced in Section 4.1 are plate-like and thus, a priori, not well-suited for capturing 
such one-dimensional singularities. However, these plate-like shearlet systems still 
deliver the nearly optimal error rate as the following result shows. The proof of the 
result is postponed to Section 10. 

Theorem 6.2. Let a € (1, 2], c £ and let 0, V', V', V' € ^^(M^) be compactly 

supported. For each k € [—1, 1] and xs G M., define g^^^^ ^ i^(IR^) by 

and, for each k € [—1, 1] and X2 G K, define g\^,j.^ & L^(M^) by 

.9^,2:2(^1 '3^3) = ■4'{xi,KX3 +X2,X3). 

Suppose that, for all ^ = (^1,^2,^3) S K^, k € [—1, 1], and X2,X3 S M., the function ip 
satisfies: 

(i) m)\ < C ■ min{l, lain • min{l, l^m ' min{l, • min{l, jam, 

(ii) |(4)'^°-3(a,6)|<IMa)|-(l+^)"' fori = Q,l, 

(iii) |(4)'^«,-(^i'^3)|<IMa)|-(l + i^)"' /or^ = 0,l, 

where d > 8, j > A, h € L^(R), and C a constant, and suppose that tp and tp satisfy 
analogous conditions with the obvious change of coordinates. Further, suppose that 
the shearlet system SH{(j), ijj, tp, up; c, a) forms a frame for L^{M.^). 

Let P e [a, 2]. Then, for any v > Q, L > Q, and fj, > 0, the shearlet frame 
SH{(j),'tp,tl!,il>;c,a) provides nearly optimally sparse approximations of functions f G 
£^ j^{R^) in the sense that 



7V-l(logiV)^ if 13 = a = 2, 



Wf-fNWl^ < { ' ^ ' } as TV ^ 00, 



and 



, z//3e[a,2),l 
|c„| <; < > as n ^ 00, 

[ logn, if(3 = a = 2,j 

where r = T(a) is given by (6.1). 

We remark that there exist numerous examples of tpTip, and 1}) satisfying the 
conditions (i) and (ii) in Theorem 6.1 and the conditions (i)-(iii) in Theorem 6.2. One 
large class of examples are separable generators ip,ip,ip G L^(1R'^), i.e., 



ip{x) = riixi)ipix2)ip{xs), ip{x) = ifi{xi)r]{x2)(p{x3), Tp{x) = ip{xi)(p{x2)v{x3), 
where r],(p G L'^ (M) are compactly supported functions satisfying: 
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(i) < Ci • min{l, \uj\^} ■ min{l, |a;|-^}, 

(ii) {£:Yip{Lu) <C2-min{l,\uj\-^} for ^ = 0, 1, 

for w G M, where q: > 8, 7 > 4, and Ci,C2 are constants. Then it is straightforward 
to check that the shearlet ip satisfies the conditions (i)-(iii) in Theorem 6.2 and tpjip 
satisfy analogous conditions as required in Theorem 6.2. Thus, we have the following 
result. 

Corollary 6.3. Let a e (1,2], c e and let r],(p e L'^{R) be compactly 

supported functions satisfying: 

(i) r)(a;)| < Ci • min {l, • min {1, 

(ii) (^)'(^(a;)| <C2-min{l,M-n for £ = 0,1, 

for cj G K, where S > 8, ^ > A, and Ci and C2 are constants. Let (j) € i^(M^) he 
compactly supported, and let £ L^(M'^) he defined hy: 

i^{x) = r]{xiy^{x2)ip{xz), tpix) = (p{xi)T](x2)f{x3), tpix) ^ (p{xi)Lp{x2)r]{x3). 

Suppose that the shearlet system SH{(j), -0, -0, -0; c, a) forms a frame for _L^(M'^). 

Let j3 e [ot,2]. Then, for any v > Q, L > 0, and /x > , the shearlet frame 
SH{^,ip,il),'il);c,a) provides nearly optimally sparse approximations of functions f G 
£^ j^{M.^) in the sense that 



N-^{logNf, if 15 = a = 2, 



and 




ifpG[a,2), 
ifp = a = 2, 



as n 



where r = T(a) is given hy (6.1). 

In the remaining sections of the paper we will prove Theorem 6.1 and Theorem 6.2. 

6.2. General Organization of the Proofs of Theorems 6.1 and 6.2. Fix 

a e (1,2] and c e {R+)^, and take B e STAR"{v) and f = fo + Xsfi S •Sf (M^). 
Suppose SH{(j), tjj, ip, 'ip; c, a) satisfies the hypotheses of Theorem 6.1. Then by condi- 
tion (i) the generators "07 "0 s-^id ip are absolute integrable in frequency domain hence 
continuous in time domain and therefore of finite max-norm ||-||/^oo. Let A denote the 
lower frame bound of SH{(j), ip, "ip, "ip; c, a). 

Without loss of generality we can assume the scaling index j to be sufficiently 
large. To see this note that supp/ C [0, 1]"^ and all elements in the shearlet frame 
SH{(j), tp, 0, 0; c, a) are compactly supported making the number of nonzero coeSi- 
cients below a fixed scale jo finite. Since we are aiming for an asymptotic estimate, 
this finite number of coefficients can be neglected. This, in particular, means that we 
do not need to consider frame elements from the low pass system $(0;c). Further- 
more, it suffices to consider shearlets \I/('0) = {V'i.fc.m} associated with the pyramid 
V since the frame elements ■4'j,k,m and i>j^k,m can be handled analogously. 

To simphfy notation, we denote our shearlet elements by il)\, where A = (j, k, m) 
is indexing scale, shear, and position. We let Aj be the indexing sets of shearlets in 
^'(-0) at scale j, i.e., 

*(V) = {VA:AeA,-,j>0}, 
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and collect these indices cross scales as 

oo 

A=|JA,.. 

j=o 

Our main concern will be to derive appropriate estimates for the shearlet coef- 
ficients {{f,^\) : A e A} of /. Let c(/)* denote the nth largest shearlet coefficient 
{f,ipx) in absolute value. As mentioned in Section 3.3, to obtain the sought esti- 
mate on 11/ — /jv||j:,2 in (6.2), it suffices (by Lemma 3.1) to show that the nth largest 
shearlet coefficient c(/)'^ decays as specified by (6.3). 

To derive the estimate in (6.3), wc will study two separate cases. The first case 
for shearlet elements tpx that do not interact with the discontinuity surface, and the 
second case for those elements that do. 

Case 1. The compact support of the shearlet ip\ does not intersect the boundary of 

the set B, i.e., jsupp^/iA n dB\ = 0. 
Case 2. The compact support of the shearlet tpx does intersect the boundary of the 

set B, i.e., |supp'(/';^ n dB\ ^ 0. 
For Case 1 we will not be concerned with decay estimates of single coefficients 
(/, V'a)) but with the decay of sums of coefficients over several scales and all shears and 
translations. The frame property of the shearlet system, the Sobolcv smoothness of / 
and a crude counting argument of the cardinal of the essential indices A will basically 
be enough to provide the needed approximation rate. We refer to Section 7 for the 
exact procedure. 

For Case 2 we need to estimate each coefficient (/, ipx) individually and, in par- 
ticular, how I (/, decays with scale j and shearing k. We assume, in the remainder 
of this section, that /o = whereby / = xsfi- Depending on the orientation of the 
discontinuity surface, we will split Case 2 into several subcases. The estimates in each 
subcase will, however, follow the same principle: Let 

M = supp ipx n B. 

Further, let H be an affine hyperplane that intersects M and thereby divides M into 
two sets Mt and M;. We thereby have that 

(/, V'a) = iXMtf, V'a) + {X.M, f, V'a) ■ 

The hyperplane will be chosen in such way that vol(Mt) is sufficiently small. In 
particular, vol (Mt) should be small enough so that the following estimate 

|(xmJ,V'a)| < ll/ILoc Uxh^yo\{Mt) < ^^2^'^'^+^y^vo\{Mt) 

does not violate (6.3). We call estimates of this form, where we have restricted the 
integration to a small part Mt of Af, truncated estimates (or the truncation term). 

For the other term (xMifji-'x) we will have to integrate over a possibly much 
large part Mi of M. To handle this we will use that tpx only interacts with the 
discontinuity of XMif on a affine hyperplane inside M. This part of the estimate is 
called the linearized estimate (or the linearization term) since the discontinuity surface 
in {xMi.f,'>Px) has been reduced to a linear surface. In (xa/, /. ( 'a) wc arc integrating 
over three variables, and wc will as the inner integration always choose to integrate 
along lines parallel to the "singularity" hyperplane H. The important point here is 
that along all these line integrals, the function / is C'^-smooth without discontinuities 
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on the entire interval of integration. This is exactly the reason for removing the Mf 

part from M . Using the Fourier slice theorem we will then turn the line integrations 
along H in the spatial domain into two-dimensional plane integrations the frequency 
domain. The argumentation is as follows: Consider : — >■ C compactly supported 
and continuous, and let j; : — > C be a projection of g onto, say, the X2 axis, i.e., 
p{xi,X3) = J^g{xi,X2,X3)dx2. This immediately implies that = gi^i^^iCs) 

which is a simplified version of the Fourier slice theorem. By an inverse Fourier 
transform, we then have 

[ g{xuX2,xs)dx2=p{xi,X3)= [ ^(6,0,6)e'"'«"^'"^''(«^'«=*)>dad6, (6.4) 

and hence 

/ \g{x,,X2,X3)\dx2= f |.9(ei:0,6)|d6dC3- (6.5) 

The left-hand side of (6.5) corresponds to line integrations of g parallel to the X1X3 
plane. By applying shearing to the coordinates x € M^, we can transform H into a 
plane of the form {a; € : xi = Ci,X3 = C2}, whereby we can apply (6.5) directly. 

Finally, the decay assumptions on ip in Theorem 6.1 are then used to derive 
decay estimates for \{f,tpx)\. Careful counting arguments will enable us to arrive at 
the sought estimate in (6.3). We refer to Section 8 for a detailed description of Case 2. 

With the sought estimates derived in Section 7 and 8, we then prove Theorem 6.1 
in Section 9. The proof of Theorem 6.2 will follow the exact same organization 
and setup as Theorem 6.1. Since the proofs are almost identical, in the proof of 
Theorem 6.2, we will only focus on issues that need to be handled differently. The 
proof of Theorem 6.2 is presented in Section 10. 

We end this section by fixing some notation used in the sequel. Since we are 
concerned with an asymptotic estimate, we will often simply use C as a constant 
although it might differ for each estimate; sometimes we will simply drop the constant 
and use < instead. We will also use the notation rj ~ sj for rj , sj G R, if Ci rj < 
Sj < C2 Tj with constants Ci and C2 independent on the scale j. 

7. Analysis of shearlet coefficients away from the discontinuity surface. 

In this section we derive estimates for the decay rate of the shearlet coefficients (/, tp\) 
for Case 1 described in the previous section. Hence, we consider shearlets t/jx whose 
support does not intersect the discontinuity surface dB. This means that / is C^- 
smooth on the entire support of tpx, and we can therefore simply analyze shearlet 
coefficients {f,ipx) of functions / e C^(R^) with supp / C [0,1]-'. The main result of 
this section. Proposition 7.3, shows that /Arll^2 = 0{N^'^^^^^+^) as iV 00 for 
any e, where /at is our A^-term shearlet approximation. The result follows easily from 
Proposition 7.2 which is similar in spirit to Proposition 7.3, but for the case where 
/ € . The proof builds on Lemma 7.1 which shows that the system ^'(V') forms 
a weighted Bessel-like sequence with strong weights such as (2"'^'')j>o provided that 
the shearlet tp satisfies certain decay conditions. Lemma 7.1 is, in turn, proved by 
transferring Sobolev differentiability of the target function to decay properties in the 
Fourier domain and applying Lemma 5.6. 

Lemma 7.1. Let g e H^{M.^) with suppg C [0,1]^. Suppose that ip e ^^(M^) is 
{5, j)- feasible for 6 > 2'y + (3, 'y > 3. Then there exists a constant B > such that 
00 

E E E \Mj,'^,m)\' < 

j=0 |fc|<[23(°-i)/2] rneza 



28 G. KUTYNIOK, J. LEMVIG, AND W.-Q LIM 

where d^^'^'^^g denotes the ^-fractional partial derivative of g — g{xi,x2,X3) with 
respect to Xi. 

Proof Since ip € iy^(M^) is (J, 7)-fcasiblc, we can choose if € L^(M^) as 

{2ni^,fm = ^iO for ^ e M^ 

hence -0 is the ^^'^''^'''^-fractional derivative of ip. This definition is well-defined due to 
the decay assumptions on ijj. By definition of the fractional derivative, it follows that 

\{d(^''-'^g,^j,k,mf = \{{2ni^ifm,^)\' 

= \(g,d^0'°'°^<fij,k,mf = 2^^^^ \{g,i^^,k,m)\\ 

where we have used that d<^P'°'°^ fj,k,m = {'2^°'^y{d<^^'°'°^f)j,k,m for / e Hl^{R^). A 
straightforward computation shows that (p satisfies the hypotheses of Lemma 5.6, and 
an application of Lemma 5.6 then yields 

oo oo 

E E E2"^'K^'V'i.^-)i' = E E EK^'''°'°^5'^i.'^-f 

j=0 |fe|<[23(«-i)/2] meZ3 j=0 |fc|<[2J(°-i)/2] meZ3 

<B\\d^^'''')g\\l., 

which completes the proof. □ 

We are now ready to prove the following result. 

Proposition 7.2. Let g e H^{M.^) with suppg c [0,1]^. Suppose that tp e 
L^(M^) is compactly supported and {6, ^)-feasible for S > 2j + p and 7 > 3. Then 



J2 l^iaYnf < N-^^/' as TV ^00, 



n>N 

where c(g()* is the nth largest coefficient {g,il>\) in modulus for tjjx € "^{ip)- 
Proof Set 

Aj= {A e Aj : supptpx n suppg =/= 0}, j > 0, 

i.e., Aj is the set of indices in Aj associated with shearlets whose support intersects 
the support of g. Then, for each scale J > 0, we have 

j-i j-i 

TVj = I y Aj I ~ ^(2J(«-l)/2)2 2i«/2 2^/2 2-'V2 = 2(3/2)"'', ^j -^^ 

j=0 j=0 

where the term (2-'("^-'-'/2)2 jg ^-^j^, ^j-^^ number of shearing \k\ = \{ki,k2)\ G 
2j(«-i)/2 at scale j and the term 2-'"/2 2^/^ 2^/^ ^j^g number of transla- 

tion for which g and ip\ interact; recall that ipx has support in a set of measure 
2-ia/2 . 2-i/2 . 2-J/2. 

We observe that there exists some C > such that 

^2««c Y: \c{g):f<C.J2j:J2^''''°\M^^'',rn)f 

00 , j 

= C.^^K5,Vi,.,™)|^(E2"'^' 

j=l k,m jo = i 
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By Lemma 7.1, this yields 



^2««° ^ |c(9):|2<C.5^5^2««|(5,V,-,.,„)|^<oo, 



and thus, by (7.1), that 



2/3/3 

Jo 



Finally, let N > 0. Then there exists a positive integer jo > such that 

which completes the proof. □ 

We can get rid of the Sobolev space requirement in Proposition 7.2 if we accept 

a slightly worse decay rate. 

Proposition 7.3. Let f e C'^(M^) with suppg c [0,1]^. Suppose that V' S 
i^(]R^) is compactly supported and {6, -y)- feasible for 5 > 27 + /3 and 7 > 3. Then 

E l^(5):i'<iV-2/5/3+- as iV^oo, 

n>N 

for any e > 0. 

Proof. By the intrinsic characterization of fractional order Sobolev spaces [1], we 
see that Cq (M^) C Hq~''{R^) for any £ > 0. The result now follows from Proposi- 
tion 7.2. □ 

8. Analysis of shearlet coefficients associated with the discontinuity 
surface. We now turn our attention to Case 2. Here we have to estimate those 
shearlet coefficients whose support intersects the discontinuity surface. For any scale 
j >0 and any grid point p e Z^, we let Qj^p denote the dyadic cube defined by 

Q^,p = [-2-^V2^2-^-/2]3 + 2-^-/22p. 

We let Qj be the collection of those dyadic cubes Qj^p at scale j whose interior 
int(Qj,p) intersects dB, i.e., 

Qj = {Qj,P ■■ int(Q,,p) n as ^ 0,p e z^}. 

Of interest to us are not only the dyadic cubes, but also the shearlet indices associated 
with shearlets intersecting the discontinuity surface inside some Qj^p S Qj, i.e., for 
j >0 and p e with Qj^p e Qj, we will consider the index set 

Aj^p {A e Aj : int(supp V'a) n int(Qj^p) D dB ^ 0}. 

Further, for j > 0, p £ Z^, and < £ < 1, we define Aj^p(£) to be the index set 
of shearlets ipx, A e Aj^p, such that the magnitude of the corresponding shearlet 
coefficient (/, ipx) is larger than e and the support of ipx intersects Qj^p at the jth 
scale, i.e., 

A,-,p(£) = {Ae Aj,p: |(/,Va)| >£}. 
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The collection of such shearlet indices across scales and translates will be denoted by 
A(e), i.e., 

j,p 

As mentioned in Section 6.2, we may assume that j is sufficiently large. Suppose 
Qj,p € Qj for some given scale j > and position p e Z^. Then the set 

^j,p= U suPPV'A 

is contained in a cube of size C ■ 2~^/^ by C ■ 2~^/^ by C ■ 2~^/^ and is, thereby, 
asymptotically of the same size as Qj,p- 

We now restrict ourselves to considering B G STAR"{u)-, the piecewise case B e 
STAR'^(y,L) will be dealt with in Section 10. By smoothness assumption on the 
discontinuity surface dB, the discontinuity surface can locally be parametrized by 
either (a;i, X2, E{xi,X2)), {xi, E{xi,xz), x^), or (i?(a;2, X3), a;2, Xa) with i5 G in the 
interior of 5^ p for sufficiently large j. In other words, the part of the discontinuity 
surface dB contained in Sj^p can be described as the graph xz = E{xi,X2), X2 = 
E{xi,X3), or xi = E{x2,xs) of a C" function. 

Thus, we are facing the following two cases: 
Case 2a. The discontinuity surface dB can be parametrized by {E{x2,xs),X2,X3) 
with E e C° in the interior of Sj^p such that, for any A G Aj,p, we have 

\d^^^°'>E{x2,X3)\ < +00 and \d^°^^^ E{x2,X3)\ < +00, 

for all X = {xi,X2,X3) G int(Qj_p) n int(supp ■0a) H dB. 
Case 2b. The discontinuity surface dB can be parametrized by {xi,X2, E(xi,X2)) 
or {x-i,E{x-i,X3),X3) with E G in the interior of Sj^p such that, for any 
A G Aj_p, there exists some x = {xi,X2,X3) G mt{Qj^p) fl int(suppV'A) 9B 
satisfying 

d'-^'°^E{xuX2) =0 or d''^'°'>E{xuX3) =0. 

8.1. Hyperplane discontinuity. As described in Section 6.2, the linearized 
estimates of the shearlet coefficients will be one of the key estimates in proving The- 
orem 6.1. Linearized estimates are used in the shghtly simplified situation, where the 
discontinuity surface is linear. Since such an estimate is interesting in it own right, 
we state and prove a linearized estimation result below. Moreover, we will use the 
methods developed in the proof repeatedly in the remaining sections of the paper. 
In the proof, we will see that the shearing operation is indeed very effective when 
analyzing hyperplane singularities. 

Theorem 8.1. Let ip G L^(M^) be compactly supported, and assume that 1/' 
satisfies conditions (i) and (ii) of Theorem 6.1. Further, let A G Aj^p for j > and 
p ^ ij^ . Suppose that f G £^{E?) for 1 < a < /? < 2 and that dB is linear on the 
support of tpx in the sense that 

supp ipxCidB c H 
for some affine hyperplane H ofM.^. Then, 
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(i) if H has normal vector (—1, si, S2) with si < 3 and S2 < 3, 

for some constant C > 0. 

(ii) if H has normal vector (— l,si,S2) with si > 3/2 or S2 > 3/2, 

I (/, V-a) I < C • 2-j(«/4+ V2+a/3/2) (8_2) 

/or some constant C > 0. 

(iii) i/i? /las normal vector (0, si,S2) with si,S2 € M, f/ien fS.-8j holds. 

Proof. Let us fix {j,k,m,) G A^-^p and / e ff(M^). We can without loss of 
generality assume that / is only nonzero on B. We first consider the cases (i) and 
(ii). The hyperplane can be written as 

H = {x gR^ : {x- xo, (-1, 81,82)) = 0} 

for some Xq G M^. We shear the hyperplane by S-s for s = (si, S2) and obtain 

S-sH ={xe M-'^ : {Ssx - xo, (-1, si, .S2)) = O} 

= {xe . _ (5,,)^(-l, si,S2)) = 0} 

= {a; e ffi^ : - S.^xq, (-1, 0, 0)) = 0} 

= {x = {x-i,X2,X3) G M.^ : xi = xi} , where x = S-gXo, 

which is a hyperplane parallel to the X2X^ plane. Here the power of shearlets comes 
into play since it will allow us to only consider hyperplane singularities parallel to the 
X2XS plane. Of course, this requires that we also modify the shear parameter of the 
shearlet, that is, we will consider the right hand side of 

{f,^j,k,m)= {f{Ss-),tpj^k,m) 

with the new shear parameter k defined by ki = ki + 2^'^"~^)/^si and fc2 = ^2 + 
2i("-i)/2s2. The integrand in {f{Ss-),'ip^ ^ ^) has the singularity plane exactly located 
on x\ = xi, i.e., on S-gH. 

To simplify the expression for the integration bounds, we will fix a new origin on 
S-sH, that is, on xi = Xi; the X2 and x^ coordinate of the new origin will be fixed in 
the next paragraph. Since / is assumed to be only nonzero on B, the function / will 
be equal to zero on one side of S-gH, say, xi < xi. It therefore suffices to estimate 

{foiSs-)xn,tpj,k,m) 

for /o e C^{R^) and = R+ x R^. We first consider the case \ki\ < ik]- We further 
assume that fci < and ^2 < 0. The other cases can be handled similarly. 

Since ip is compactly supported, there exists some L > such that supp'^ C 
[—L,L]^. By a rescaling argument, we can assume L = 1. Let 

Vj^k ■■= {a; e : \2^»/^xi + 2^'^hx2 + 2^'^k2X3\ < 1, \x2\ , |a;3| < 2-^'^^ , (8.3) 

With this notation, we have snj>pipj^k,o C Vj^k- We say that the shearlet normal 
direction of the shearlet box Vj^ is (1,0,0), thus the shearlet normal of a sheared 
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element i>j,k,m associated with Vj^k is (1, ki/2^^°' ^^Z^, k2/2^^°' ^^Z^). Now, we fix our 
origin so that, relative to this new origin, it holds that 

Then one face of Vj f. intersects the origin. 

For a fixed \x^\ < 2~^/'^, we consider the cross section of the parallelepiped f. 
on the hyperplane X3 = X3. This cross section will be a parallelogram with sides 

X2 = ±2-j/2, 

2J"/2a;i + 2^/^kiX2 + 2^^'^k2X3 = 0, and 2^"/'^xi + 2^/^kiX2 + 2^/'^k2X3 = 2. 

As it is only a matter of scaling we replace the right hand side of the last equation 
with 1 for simplicity. Solving the two last equalities for X2 gives the following lines 
on the hyperplane X3 = £3: 



Li: X2 - 
We therefore have 



2j(a-l)/2 f.^ 2-?(«-l)/2 

^ Xi — — X3, and L2 : X2 = ^ 

fci ki k\ 



2-J/2 



k2 ^ 

xi - — xa H — 

k\ ki 



< 



^2-3/2 .Ki i-Li 

/ / / fo{Ssx)ip,j. ^{x)dx2dxidxi 

J-2-3/2 Jo JL2 ' 



iA) 



where the upper integration bound for xi is Ki = 2~-'("/^^ - 2~^°'/'^ki - 2^^°'~'^^/'^k2X3 
which follows from solving L2 for xi and using that \x2\ < 2^^/^. We remark that the 
inner integration over X2 is along lines parallel to the singularity plane dQ = {0} x M^; 
as mentioned, this allows us to better handle the singularity and will be used several 
times throughout this paper. 

For a fixed |a;3| < 2~^/'^, we consider the one-dimensional Taylor expansion for 
fo{Ss-) at each point x = {xi,X2,X3) G L2 in the a;2-direction: 

/ 2J'(«-iV2 k2 2-^/2 \ 

fo{Ssx) = a{xi,X3) + b{xi,X3) X2 H ^ xi + —X3 ^ — 

\ ki ki ki J 

-|-c(a;i,a;2,a;3) X2H ^ xi + —X3 ^ — , 

\ ki ki ki J 

where a{xi,X3), b{xi,X3) and c{xi,X2, X3) are all bounded in absolute value by C(l + 
|si|)''. Using this Taylor expansion in (8.4) yields 



..2-'/2 ^Ki 3 



(/o(^.-)xo,V,-^,„)|<(l + ki|)'' / / 'j2liix,,X3)dx,dx 



where 



/i(a:i,a;3) 
-^2(2:1, 2:3) 
-^3(2:1, 0:3) 



Li 
L2 



{X2 + K2)i^^;^„^{x)dx2 



{X2Y 'll^j^%^^{xi,X2 - K2.,X3)dX2 
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and 

2j(a-l)/2 2-^/2 

K2 = ^ Xi + — ^ — . 

fci ki ki 

We next estimate each of the integrals /i, I2, and /s separately. We start with 
estimating 7i(a;i,a;3). The Fourier Slice Theorem (6.4) yields directly that 



h{xi,X3) = I V'j-fc,„(a;)da;2 



M2 



By assumptions (i) and (ii) from Theorem 6.1, we have, for all ^ = (^1,^2,^3) S K^, 



1+ 



for some h G L^{M.). Hence, we can continue our estimate of h: 



h{xi,X3)< / 2-JT^|/i(2-^"/2^i)|(l + |fci|)-'^(l 

JM2 



S3 



and further, by a change of variables, 

h{x„X3)< [ 2^"/4 1/1(6)1 + 

<2^«/4(l + |fc^|)-7, 



1 + 



2-^"/2{i 



) ''d^d^ 



since h e L^K) and (1 + [^3/6 + = 0(1) as |Ci| -)> 00 for fixed 6- 

We estimate /2(a;i,a;3) by 



-^2(3:1, 3:3) < 



/ 

Jr 



^2V'j,fc,„(a;)da;2 



IK. 



V'j,fc,m(3^)da^2 



=: 5i + S2 



Applying the Fourier Slice Theorem again and then utilizing the decay assumptions 
on tjj yields 



Si 



< 



2^2V'j-fc,„(2:)da;2 
d 



< 



2i (a/4- 1/2) 



Since |a;i| < -ki/2^ and |6| < 2--?/2, we have that 



i _|_ 0~J/^ 



The following estimate of ^2 then follows directly from the estimate of /i: 

S2 < \K2\V'^I^ (1 + \kx\)~^ < 2J(«/4-i/2)(l + 

From the two last estimate, we conclude that /2(2;i,a;3s) < . 

(i+|fei|)'^"'' 
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Finally, we estimate l3{xi,X3) by 



I 



-^3(2:1, 2:3) < 

< 2j(«/4+l/2) 



ix2r llV'a.mlU- da;2 



{X2fdX2 



2i(a/4-/3/2) 



Having estimated /i, 72 and /a, we continue with (8.5) and obtain 

2-j(«/4+l/2) 2--'(«/4+V2+/8/2) \ 



By performing a similar analysis for the case | fc2 1 < | fci | , we arrive at 



/o(5.-kn,V,-,fc,„;|< min (l + |s,|) 



/9 



2-i(a/4+l/2) 2-j("/4+l/2+/3/2) 
+ - 



(l + |/Ci|r-i 



l/c,: 



Suppose that si < 3 and S2 < 3. Then (8.6) reduces to 

|(/,^j,/c,m)| ^ inin 



2-j(a/4+l/2) 2-j(«/4+/3/2+l/2) 
+ — 



< 



=M [(1 + 1^,1)7-1 

2-j("/4+l/2) 



i=i,2 [|A;i + 2J("-i)/2si|3^ 
since 7 > 4 and p > a. On the other hand, if si > 3/2 or si > 3/2, then 

|(/,V',-,.,™)|<2-^-(«/2+^/^)". 

To see this, note that 



mm\{l + \s,\r 



i=l,2 



2-j(a/4+/3/2+l/2) 

{i + \k\y 



mm 

1=1,2 



< 



i-.i^ + ki)/ Si + 2^('^-^)/^\y , 

2-j(a/4+/3/2+l/2) 



2-j(a/4+l/2+a/3/2)_ 



This completes the proof of the estimates (8.1) and (8.2) in (i) and (ii), respectively. 

Finally, we need to consider the case (iii) in which the normal vector of the hyper- 
plane H is of the form (0, si, S2) for Si, S2 € M. Let f2 = e M"^ : S1X2 > —82X3^ 
in the first part of the proof, it suffices to consider {xcifo, ''Pj.k.m), where sirpp ipj_k,m C 
Vj^k — (2~''"/2, 0, 0) = Vj^k with respect to some new origin. As before the boundary 
of Vj^k intersects the origin. By assumptions (i) and (ii) from Theorem 6.1, we have 
that 



_d_ 



V'(0,6,6) = for £ = 0,1, 
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which implies that 

x{ijjix)dxi = for all X2,xs € R and £ = 0,1. 
Therefore, we have 

/ xitp{Skx)dx-i = for all X2,X3eR,k = (fci, A:2) € and ^ = 0, 1, (8.7) 
Jm. 

since shearing operations Sk preserve vanishing moments along the Xi axis. Since 

the xi axis is in a direction parallel to the singularity plane dfl, we employ Taylor 
expansion of /o in this direction. By (8.7) everything but the last term in the Taylor 
expansion disappears, and we obtain 

|(Xn/o,V',-,fe,m)|<2^'^«/^+'/'^ r /' r (xi)'5dxida;2dx3 

J -2-3/2 J_2-j/2 J_2-3a/2 

< 2j(a/4+l/2) 2-j(/3+l)a/2 = 2-^'^"/^+^/^+'^l^/'^\ 

which proves claim (iii). □ 

8.2. General C"-smooth discontinuity. We now extend the result from the 

previous section. Theorem 8.1, from a linear discontinuity surface to a general, non- 
linear C"-smooth discontinuity surface. To achieve this, we will mainly focus on the 
truncation arguments since the linearized estimates can be handled by the machinery 
developed in the previous subsection. 

Theorem 8.2. Let tp e L^(R"^) be compactly supported, and assume that tjj 
satisfies conditions (i) and (ii) of Theorem 6.1. Further, let j > and p € 1? , and 
let A e Aj.p. Suppose f e (K'"*) for 1 < a < (3 < 2 and > 0. For fixed 
X = {xi,X2,X3) G int(Qj,p) n int(supp'i/'A) H dB, let H be the tangent plane to the 
discontinuity surface dB at {xi,X2,X3). Then, 

(i) if H has normal vector (— l,si,S2) with si < 3 and S2 < 3, 

|(/,V'a)| < C- min <^ — , ^A , (8.8) 

for some constant C > 0. 

(ii) if H has normal vector (— l,si,S2) with si > 3/2 or S2 > 3/2, 

K/,Va)|<C-2-^(«/2+V4)«, (8.9) 

for some constant C > 0. 

(iii) if H has normal vector (0, 81,82) with 81,82 € M, then (8.9) holds. 

Proof. Let {j,k,m) G A^^p, and fix x = {xi,X2,X3) £ int(Qj_p)nint(supp ^/'A)n(9i?. 
We first consider the case (i) and (ii). Let (— l,Si,S2) be the normal vector to the 
discontinuity surface dB at {xi, X2, x^). Let dB be parametrized by {E{x2, X3), X2, X3) 
with E e in the interior of Sjp. We then have si = d'^^''^'^ E{x2,X3) and 82 = 

a(0.1)£(x2,X3). 

By translation symmetry, we can assTime that the discontinuity surface satisfies 
E(Q,Q) — with (xi,i;2,X3) — (0,0,0). Further, since the conditions (i) and (ii) in 
Theorem 6.1 are independent on the translation parameter m, it does not play a role 
in our analysis. Hence, we simply choose m = (0,0,0). Also, since V is compactly 
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supported, there exists some L > such that suppV' C [—1,1]"^. By a rescaling 
argument, we can assume L = 1. Therefore, we have that 

suppVj,fe,o C Vj^k- 

where Vj^k was introduced in (8.3). 

Fix / G (M"^). Wc can without loss of generality assume that / is only nonzero 
on B. We let V be the smallest parallelepiped which contains the discontinuity surface 
parametrized by {E{x2, X3), X2, X3) in the interior of supp V'j\fe,o- Moreover, we choose 
V such that two sides are parallel to the tangent plane with normal vector (— 1, si, 52)- 
Using the trivial identity / = xvf + XV"/, we see that 

(/, i^j,k,o) = iXvf, tl'j,kfi) + iXvf, V'j,fe,o) • (8.10) 

We will estimate \{f,tpj,k,o)\ by estimating the two terms on the right hand side of 
(8.10) separately. In the second term (xp=/) V'j.fe.o) the shearlet only interacts with a 
discontinuity plane, and not a general surface, hence this term corresponds to a 

linearized estimate (see Section 6.2). Accordingly, the first term is a truncation term. 

Let us start by estimating the first term {xvfi i'j,k,o) in (8.10). Using the notation 
fci = fci + 2J(°-i)/2si and k2 = k2 + 2J("-i)/2s2, we claim that 

|(Xp/,^,,m)| < min (^(l + .?)^-___ j . (8.11) 

We will prove this claim in the following paragraphs. 

We can assume that fci < and k2 < since the other cases can be handled 
similarly. We fix 1 < 2"-'/^ and perform first a 2D analysis on the plane 0:3 = £3. 
After a possible translation (depending on f 3) we can assume that the tangent line 
of dB on the hyperplane is of the form 

Xi = Si(x3)x2 + .T3- 

Still on the hyperplane, the shearlet normal direction is (1, fci/2-'/^). Let d = d{x3) 
denote the distance between the two points, where the tangent line intersects the 
boundary of the shearlet box Vj^k- It follows that 

d{xs)<{i + s,{x-i)y^'- 



1 + fci +2j("-l)/2si(f3)| 

as in the proof of Proposition 2.2 in [24]. We can replace 51(^3) by Si = si(0) in the 
above estimate. To see this note that E G C" implies 

sl(^3)-sl(o)<|i3r'<2-^■(«-^)/^ 

and thereby, 

2-J/2 ^ 2--'/2 

|1 + fci + 2J(a-i)/2si(f3)| ~ i + ~k^+ 2^(«-i)/2si(0) 

where fci = fci + 2^'^"-^'>/'^{si{x3) - si(0)). Since 

|fci|-C< |fci| < \k\ + c 
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for some constant C, there is no need to distinguish between ki and fci, and we arrive 
at 

d{x3) < a + sl)^/^ , ^^^^——^=:d (8.12) 

for any |a;3| < 2'^^"^. 

The cross section of our parallelepiped V on the hyperplane will be a parallelogram 
with side length d and height (up to some constants). Since |a;3| < 2"-'/^ for 
{xi, X2, xs) e Pj,k, the volume of P is therefore bounded by: 

In the same way we can obtain an estimate based on /c2 and S2 with ki and si replaced 
by k2 and S2, thus 

vol(P) < min { {1 + s^)~ 



i=i,2y " (1+ + 2J(«-i)/2si|)°+i j ■ 

Finally using |(xp/, V'j.fe.o)! < ||Vj,fc,o|lioo vol (P) = 2^(«/4+V2) vol (7^), we arrive at 
our claim (8.11). 

We turn to estimating the linearized term in (8.10). This case can be handled as 
the proof of Theorem 8.1, hence we therefore have 

\, f , /2-J(«/4+V2) 2-J("/^+''/2+V2) \ 1 

/.(5,.)xo,*,,,„)l < £.= |(i + (^YTlEiF^ + — SiJ — j I ■ 

(8.13) 

By summarizing from estimate (8.11) and (8.13), we conclude that 

■ 2-j(a/4+l/2) 2-i(a/4+/3/2+l/2) \ 



laV'.-.M)!^ minj (l + |s.|)^ 

1=1,2 



(1 + |4|)^-1 \k\^ 



+ 1 + s?^ \. 8.14 

(i + |fcil)"+i! 



If si < 3 and S2 < 3, this reduces to 



f 2-i(a/4+l/2) 2--'("/4+/3/2+l/2) 2--''("/4+l/2) ] 

I ( f, V'l *■ n)l 5- min < x 1 ^ 1 ^ > 

' ' -1-2 \ (1 + (1 + |fc,|)"+l / 

~ »^J!'2\|A;i + 2J(«-i)/2si|«+i j ' 
since 7 > 4 and l3 > a. On the other hand, if si > 3/2 or si > 3/2, then 

which is due to the last term in (8.14). To see this, note that 

r 2-J'(«/4+V2) ] (n+s2)^ 2-j(«/4+l/2) 1 

+ ' (1 + |^,|)«+r I = -i:^2 I i\k,/s, + 2.(«-i)/2|)«+i I 

9-i(a/4+l/2) 

~ 2J("-i)(«+i)/2 



38 



G. KUTYNIOK, J. LEMVIG, AND W.-Q LIM 



This completes the proof of the estimates (8.8) and (8.9) in (i) and (ii), respectively. 

Finally, we need to consider the case (iii) , where the normal vector of the tangent 
plane H is of the form (0, si, .S2) for si, ,S2 G M. The truncation term can be handled 
as above, and the linearization term as the proof of Theorem 8.1. □ 

9. Proof of Theorem 6.1. Let / G ^^^(M^). By Proposition 7.2, for a < /3, we 
see that shearlet coefficients associated with Case 1 meet the desired decay rate (6.2). 
We therefore only need to consider shearlet coefficients from Case 2, and, in particular, 
their decay rate. For this, let j > be sufhciently large and let p G be such that 
the associated cube satisfies ^ G Q^, hence int(Qj_p) n dB ^ 0. 

Let e > 0. Our goal will now be to estimate first # |Aj^p(£)| and then ^ |A(e)|. 
By assumptions on ip, there exists a C > so that lli/"!!^! < C. This implies that 

|(/,Va)I<II/ILo. ||VaILi</^C2-^(«+2)/4. 

Assume for simplicity /i C ~ 1. Hence, for estimating ^\Kj^p{e)\, it is sufficient to 
restrict our attention to scales 

0<i<jo:=^^log2(£-'). 

Case 2a. It suffices to consider one fixed x = (.xi, x^) G int(Qj^p)nint(supp tp\) 
DdB associated with one fixed normal (—1, .si, ^2) in each Qj^p; the proof of this fact is 
similar to the estimation of the term {xvf, V'i,fe,o) m (8.10) in the proof of Theorem 8.2. 

We claim that the following counting estimate hold: 

*\Mj,k,Qj < |A:i+2^"("-i)/2si| + |fc2 + 2^'("-i)/2s2| + l, (9.1) 
for each k = (^1,^2) with |fci| , \k2\ < [2^(«-i)/2] , where 

Mj,k,Q,,^ := {m e : I supp V'j.fe.m n 95 n Q| ^ 0} 

Let us prove this claim. Without of generality, we can assume Q :— Qj^p = 
[_2-J722-i/2]3 and that iJ is a tangent plane to dB at (0,0,0). For fixed shear 
parameter fc, let Vj^k be given as in (8.3). Note that supp'^j,fe,o C Vj^k and that 

#|M,-,fc,c| < C ■ #|{mi e Z : {Vj^k + (2-«^/2mi, 0, 0)) n if n Q}| 

Consider the cross section Vo of j,: 

Vo = {xeW':x, + ^j^^X2 + ^Ji^xs = 0, \X2\, \xs\ < 2-^/'}. 
Then we have 

< C ■ #|{mi G Z: | (Pq + (2-"^/'mi, 0, 0)) n n Q| ^ 0} 

Note that for \x2\, \x3\ < 2'^/'^, 

H : Xi — S1X2 — S2X3 = 0, and 
Po + (2--/^m„ 0, 0) : XI - 2--/2^^ + + ^Jj^x^ = 0. 
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Solving 

S1X2 + S2X3 = 2 ^/ mi - ^JJ^^X2 - ^JJ^^Xs, 

we obtain 

mi = 2^^^{h + 2^/2(a-i)s^)a;2 + (fca + 2^/2(a-i)g2)a;3). 
Since 1x21,1x31 < 2-^/2, 

< |fcl + 2J'/2(a-l)^^| ^ ^ 2^'/2(a-l)32|. 

This gives our desired estimate. 

Estimate (8.8) from Theorem 8.2 reads |^._j^2/(c-i)/2g.|a+i ^ I(/)V'a)| > £ which 
implies that 

+ 2^("-i)/2s^.| < ^ . ^.-i/(a+i) 2-J-(^^) (9.2) 
for i = 1,2. From (9.1) and (9.2), we then see that 
#\Aj,pie)\<C *\Mj,k,Q,,M 

<C (|A:i + 2^("-^)/2^i| + |fc2 + 2^"("-i)/2s2| + l) 

(fei,fe2)6-ffi(£) 
<C.^-3/(a+l) 2-i(^^^), 

where Mj^k,Qj,p{£) = {m & Mj^k,Qj,^ ■ | (/, V'j,fc,m)| > e} and ifj(e) = {A; e : |fcj + 
2J("-i)/2s^| < c . 2-^(^^)}. 

Case By similar arguments as given in Case 2a, it also suffices to consider 
just one fixed x G int(Qj p) fl int(supp('!/'A)) H dB. Again, our goal is now to estimate 

#|A,,p(£)|. 

By estimate (8.9) from Theorem 8.2, |(/, -0^)1 > £ implies 

C . 2-j("/2+l/4)a > 

hence we only need to consider scales 

< i < ii + C, where ji := _^ 2a)a 

Since Qj p is a cube with side lengths of size 2"-'/^, we have, counting the number of 
translates and shearing, the estimate 

#|A,-p| <C-2^'3("-i)/2, 

for some C. It then obviously follows that 

#|A,-,p(e)| <C-2^'3(«-i)/2. 

Notice that this last estimate is exceptionally crude, but it will be sufficient for the 
sought estimate. 
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We now combine the estimates for # |Aj^p(e)| derived in Case 2a and Case 2b. 
We first consider a <2. Since 

#\Qj\<C-V. 

we have, 

jo jl 

#|A(£)|< Yl 2^' 2J3(«-i)/2 + J2 2J£-3/(«+i)2^'^''^S^ +^2J'2J'=^(«-iV2 

4 o//- ii\ 2(2-a) 2(3a-l) 9c«^ + 17a-10 

< ^5+2 4. £-3/(a+l)^(„ + l)(c+2)(3c-l) ^^-2(2o+l) < ^-(o + l)(c+2)(3c-l) ^ (9 3) 

Having estimated # |A(£)|, we are now ready to prove our main claim. For this, 
set AT = # |A(e)|, i.e.. A'' is the total number of shearlets ip\ such that the magnitude 
of the corresponding shearlet coefficient {f,ipx) is larger than e. By (9.3), it follows 
that 

(o + l)(c+2X3a-l) 
£■ < JV 9ci2 + 17a-10 

This implies that 

9 . , „ 2(a + l)(a+2)(3a-l) . ■■ eg^+Tg^ -lla + 6 

11/ - fN\\l2 < ^ |C(/):|2 < N -^+— = N- 9g2 + ,.<,_.o , 

n>JV 

which, in turn, implies 

(a+l)(a+2)(3a-l) 

|c(/)^| <C-Af ^-^+""-10 . 

Summarising, we have proven (6.2) and (6.3) for a G (1,2). The case a = 2 follows 
similarly. This completes the proof of Theorem 6.1. 

10. Proof of Theorem 6.2. Wc now allow the discontinuity surface dB to be 
piecewise C"-smooth, that is, B E STAR" {i^, L). In this case B is a bounded subset 
of [0, 1] whose boundary dB is a union of finitely many pieces dBi, . . . , dB^ which 
do not overlap except at their boundaries. If two patches dBi and dBj overlap, we 
will denote their comment boundary dVij or simply dF. We need to consider four 
new subcases of Case 2: 

Case 2c. The support of intersects two discontinuity surfaces dBi and dB2, 
but stays away from the ID edge curve dri^2, where the two patches dBi, 
dB2 meet. 

Case 2d. The support of tpx intersects two C" discontinuity surfaces dBi, dB2 and 
the ID edge curve 9ri,2, where the two patches dBi, dB2 meet. 

Case 2e. The support of ipx intersects finitely many (more than two) C" disconti- 
nuity surfaces dBi, . . . ,dBL, but stays away from a point where all of the 
surfaces .... dB^ meet. 

Case 2f. The support of tpx intersects finitely many (more than two) C" discontinuity 
surfaces dBi, . . . , OBl and a point where all of the surfaces dBi, . . . , OBl 
meet. 
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In the following we prove that these new subcases will not destroy the optimal 
sparse approximation rate by estimating ^\A{e)\ for each of the cases. Here, we 
assume that each patch dBi is parametrized by function Ei so that 

dBi = {{xi,X2,X3) eM.^ : xi = Ei{x2,X3)} 

and < C. The other cases are proved similarly. Also, for each case, we let Qj^p 

be the collection of the dyadic boxes containing the relevant surfaces dBi and may 
assume p — (0,0,0) without loss of generality. Finally, we assume suppi/; C [0,1]'^ 
for simplicity and the same proof with rescaling can be applied to cover the general 
case. We now estimate #|A(e)| to show the optimal sparse approximation rate in 
each case. For this, we compute the number of all relevant shearlets ■0j,fe,m in each of 
the dyadic boxes Qj^p applying a counting argument as in Section 9 and estimate the 
decay rate of the shearlets coefficients (/, V'j.fc.m)- 




Figure 10.1. Case 2c. A 2D cross sections of suppipx and the two discontinuity surfaces dB\ 
and dB2 ■ 

Case 2c. Without loss of generality, we may assume that (ii, i;2, 0) and {x'i,X2, 0) 
belong to dBi f] supp ipj,k,m H Qj,p and dB2 H supp ^pj^k.m H Qj^p respectively for some 
xi,X2, x'l, x'2 € M. Note that for a shear index k — (fci, /C2) and scale j > fixed, we 
have by a simple counting argument that 



# 



p|{m e 1? : int(supp?/>j- fc,„) n dB, n Q^-p 7^ 0} 

< C min + 2^("-i)/2s,;| + l} (10.1) 



where Si = d'-^-"^ Ei{x2,0) and S2 = d'-°'^'> E2{x'2,0). For each X3 G [0,2-^/2], we 
define the 2D slice of swpp^pj^k.m by 

{Snppijjj^k,m)x3 = {{X1,X2,X3) : {xi,X2,X3) € SUppV'j,fc,m}- 

We will now estimate the following 2D integral over {swppil)j^k,m)x3 



I 3. k, mix 3) ^ f{xi,X2,X3)Tl)j^k,m{xi,X2,X3)dxidx2. (10.2) 
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This integral above gives us the worst decay rate when the 2D support {iiupp^pj^k,m)x3 
meets both edge curves, see Figure 10.1. Therefore, we may assume that for each ^3 
fixed, the set {s\ipptpj^k.m)x3 intersects two edge curves 

{dBi)i^ = {{xi,X2,xz) : {xi,X2,xs) e dBi n Qj^p] for i = 1,2. 

By a similar argument as in Section 8.2, one can linearize the two curves {dBi)^^ 
and {dB2)x3 within (swppipj^k,ni)x3- In other words, we now replace the discontinuity 
curves (t?Bi)x3 and {dB2)x3 by 

U{xs) = {{Si{xs){X2-X2)+Xi,X2,Xz)& Q^,;, H (supp V'i,fe,m)x3 : a;2 € M} 

where 



dEi{x2-,xz) .... \^taT3\ A- 10 

Si{xz) = for some {xi,X2,X3) € {dBi)^^ and z = 1,2. 



Further, we may assume that the tangent lines Li{x'i) on (suppi/ij.fe^mjfj, do not 
intersect each other. In particular, one can take secant lines instead of the tangent 
lines if necessary. The truncation error for the linearization with the secant line 
instead of linearization with the tangent line would not change our estimates for 
# |A(e)|. Now, on each 2D support (supp^j,fc,TO)x3j we have a 2D piecewise smooth 
function 

J{xi,X2,X3) = fo{xi,X2,X3)xn„ + .fi{xi,X2,X3)xni 

where /o,/i € and r^c^^i are disjoint subsets of [0,2~-'/-^]^ as in Figure 10.1. 
Observe that 

/ = /oXfio + hXQi = (/o - /l)Xf2o + /l 

on Qj^p n {suppipj^k,m)x3- By Proposition 7.3, the optimal rate of sparse approxima- 
tions can be achieved for the smooth part /i. Thus, it is sufficient to consider the first 
term (/o — /i)xno in the equation above. Therefore, we may assume that / = goXOo 
with a 2D function go € on Qj^pCi {suppipj,k^m)x3- Note that the discontinuities of 
the function / lie on the two edge curves Liix^) for z = 1, 2 on Qj.p H (supp'(/'j,fc,rri)i3- 
Applying the same linearized estimates as in Section 8.1 for each of edge curves Li{xs), 
we obtain 

By similar arguments as in (8.12), we can replace Si{x3) by a universal choice s, for 
i = 1,2 independent of ^rs. Since X3 e [0,2"-'/^], this yields 

where ki = ki + 2^(""^)/^Si for i = 1,2 as usual. Also, we note that the number of 
dyadic boxes Qj^p containing two distinct discontinuity surfaces is bounded above by 
2-'/^ times a constant independent of scale j. Moreover, there are a total of [2-' ^^] +1 
shear indices with respect to the parameter k2- Let us define 

Kj{e) = jfci e Z : max{(l + \ki\)~^"^^^'^~^'^} > ej ■ 
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By (10.1) and (10.3), we have 

#|A(e)|< Yl 2^'/'2^'"^ Yl min{l + |fc.l}- 

j=0 ki^Kjie)'' 

Without loss of generahty, we may assume |fci| < \k2\- Then 

Letting N = ^ \ A{e)\, we therefore have that e < 3~. This imphes that 

n>N 

and this completes the proof. 

Case 2d. Let be the edge curve in which two discontinuity surfaces dBi and 
dB2 meet inside int(supp-!/'j\fc,m)- Let us assume that the edge curve dT is given 
by {Ei{x2, p{x2)),X2, p{x2)) with some smooth function p S C"(K). The other case, 
{Ei{p{x3),X3), p{x3),X3) can be handled in similar way. Without loss of generality, 
we may assume that the edge curve dT passes through the origin and that (0, 0, 0) € 
supp'^jjt m. Let K — p'(0), and we now consider the case |k| < 1. The other case, 




Figure 10.2. Case 2d. The support of'ipx intersecting the two discontinuity surfaces dBi, 
dB2 and the ID edge curve dT, where the two patches dB\ and 832 meet. The 2D cross section 
(supp i/jj jjj)^^ is indicated; it is seen as a tangent plane to dV . 

\k\ > 1, can be handled by switching the role of variables X2 and 2:3. Let us consider 
the tangent line Lq to dT at the origin. We have 

J- ^1 ^3 1 afiifO.O) , O-Ei(0,0) 

Lo ■ 7 \ r ^X2 = — , where si = ' and S2 = 3^ ' ' ■ 

(Si + KS2) K "^-^ 

For each ^3 e [0, 2^-'/^] fixed, define 

(supp?/;j;fc,m)x3 = {(a;i,X2,KX2 +%) G suppV^j^fc.m : 'X\,X2 G M}. 
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Also, let 



f)X2 

for some ^2)^2 such that 



dEi{x2,X3) 
dX3 



dE2{x2,X3) 
dX2 ' 



and 



{Ei{x2,X3),X2,X3) e dBi n (suppVj,fc,m)x3 



(£^l(x2,X3),X2,X3) G 9^2 n (supp'i/'j^fe^m)! 



0-E2(£2'^3) 
9X2 



(10.4) 



(10.5) 



If such a point X2 (or does not exist, there will be no discontinuity curve on 
(supp V'j „i)j3 which leads to a better decay of the 2D surface integrals of the form 
(10.2). Therefore, we may assume conditions (10.4) and (10.4) holds for any G 
[0,2"-'/^]. For ^2 fixed, let ki = {ki + Kk2) + 2^^{si + KS2). Applying a similar 
counting argument as in Section 9, for the shear index k = (fci, ^12) fixed, we obtain an 
upper bound for the number of shearlets i>j,k,m intersecting dV inside Qj^p as follows: 



# |{(j, k,m) : int(supp^j,fe,„) n Qj^p ndVy^ 0}| < C(|fci| + 1). 



(10.6) 



Notice that there exists a region V such that the following assertions hold: 

(i) V contains dV inside supp V-'j,fc.m H Qj,p- 

(ii) V C {{xi,X2, KX2 + i) e supp'^j^fe^TO : < t < b} D suppipj^k^m for some 

b>0. 

Here, we choose the smallest b so that (ii) holds. For each .is G [0, 2^-'/^] fixed, let 
= {{xi,X2, KX2 + X3) : Xi,X2 € M}. Applying a similar argument as in the proof 
of Theorem 8.1 to each of the 2D cross sections V fl H^^ of V, we obtain 

|fci|2J/2. 



vol(P)<2-Jtf . \ )' 



(10.7) 



Figure 10.2 shows the 2D cross section of V. Let us now estimate the decay rate of 
shearlet coefiicients (/, V'i.fe.m)- Using (10.7), 



/(a;)V'j,fe,m(a;)da; 



< 



f{x)ijj,k,m{x)dx 

2-^(¥) 



< C . + 

(l + |fci|)«+i 



f{x)lpj,k,m{^)<ix 

/ /(a:)V'j,fe,m(a;)da; 



(10.8) 



Next, we compute the second integral /^^ f{x)ipj^k,m{x)dx in (10.8). For each £3 G 
[0,2-^/2], define 

(supp V'j,fc,m)x3 = -ffx3 n SUpp V'i.fc.m H V° . 

Again, we assume that on each 2D cross section (supp ^/ij fc.m)^,, there are two edge 
curves dBi n H^^ and 9i?2 H H^^ since we otherwise could obtain a better decay rate 
of {f,'4)j^k,m)- As we did in the previous case, we compute the 2D surface integral 
Ij,k,m{xz) over the cross section (supp V'j_/j^m)x3 defined as in (10.2). Applying a 
similar linearization argument as in Section 8.2, we can now replace the two edge 
curves dBi n H±,^ for i = 1, 2 by two tangent lines as follows: 

-^i(a;3) = {((si(^3) + i^s\{x3))x2 +xi,X2 +X2,KX2 + X3) G : a;2 G M} 
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and 

L2(x3) = {{{sli^s) + Ksl{i3))x2 + x'i,X2 + x'^, HX2 + X3) G : G M}. 

Here, the points Xx,X2,x'i, and x'2 are defined as in (10.4) and (10.5), and we may 
assume that the two Hnes Li{xz) and ^2(3^3) do not intersect each other within 
(supp'^j^fe^m)i3; otherwise, we can take secant lines instead as argued in the pre- 
vious case. Let Q^g be the projection of (supp V'j,;c,m)x3 onto the X\X2 plane. By the 
assumptions on ip, we have 

Ij,k,mi^3) = a/I + K^ / f{xi,X2,KX2+X3)lpj^k,mi^i'^'i''^^2 + X3)dX2dXi 
= 2^^ \/l + k"^ / f{xi,X2,KX2+X3) 

5° 2^/2^3 {2'"'^xx + 2J/2(/ei + k2K)x2 + 2i'^k2X3, 2^/2a;2)da;2da;i 

The integral above is of the same type as in (8.5) except for the X3 translation param- 
eter. The function f{xi, X2, KX2 + ^3) has singularities lying on the projection of the 
lines Li{x3) and -^2(^3) onto the X1X2 plane which do not intersect inside int(Q^3). 
Therefore, we can apply the linearized estimate as in the proof of Theorem 8.1 and 
obtain 



\Ij,k,m{xs)\ < Cmax |2-^t (l + |(A:i + ^fe) + 2^'^(si(*3) + «4(ai3))|)' 



a-l 



By a similar argument as in (8.12), we can now replace 5^/(^3) by universal choices Sj 
for = 1,2 respectively, in the equation above. This implies 



/ 

Jv 



Therefore, from (10.8), (10.9), we obtain 



f{x)i^j,k,mix)dx <C- ^— — . (10.9) 



(l + |A:i|)"+i 



\{f,^1km)\<C . (10.10) 

In this case, the number of all dyadic boxes Qj^p containing two distinct discontinuity 

surfaces is bounded above by 2-'/^ up to a constant independent of scale j, and there 
are shear indices [2'' ^^] + 1 with respect to k2- Let us define 

Kj{s) = {fci e Z : (1 + |fci|)-(«+i)2-J'^ > e} . 

Finally, we now estimate # |A(e)| using (10.6) and (10.10). 

# |A(£)| <C Yl 2^"^ 2^/2 ^ (1 + |fci|) < Cs-^ 

which provides the sought approximation rate. 
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Case 2e. In this case, we assume that / = foxna + /iXni with /o, /i G C^, and 
that there are L discontinuity surfaces dBi, . . . dB^ inside int(supp V-'jjc.m) so that 
each of the discontinuity surfaces is parametrized by xi = Ei{x2,X3) with Ei £ C" 
ior i = 1, . . . , L. For each £3 G [O, 2"-'/^] , let us consider the 2D support 

{suppii)j,k^„i)x3 = {(a;i,a;2,X3) G suppi/'i.fc.m : xi,X2 e M}. 

On each 2D slice (suppxpj^k,m)x3, let 

dTl^ = {suppip j^k,m)x3 n 9Bi for « = 1, . . . , L. 

Observe that there are at most two distinct curves dT^.^ and 9r|^ on {suppipj,k.m)x3 
for some i' = 1, . . . , L. We can assume that there are such two edge curves 5ri_^ and 
dVl^ for each £3 e [0, 2-J/2] since we otherwise could obtain better decay rate of the 
shearlet coefficients \{,f,ipj.k.m)\- From this, we may assume that for each x^, there 
exist (i;i,X2,i;3) and {x'i,x'2,X3) £ int(supp 'i/'j,fe,TO) such that {xi,X2,X3) G dT^{x3) 
and {xi,x'2,X3) G 9r^(x3). We then set: 

4ii3) = ^^^^^ and slix3) = ^^^^^. 

Applying a similar linearization argument as in Section 8.2, we can replace the two 
edge curves by two tangent lines (or secant lines) as follows: 

-^"^(^3) = {iSlix3)x2 + Xi,X2 + X2,X3) : X2 G M} 

and 

L'^i^a) = {isl{x3)x2 + x[,X2 +£'2, £3) : a;2 e M} . 

Here, we may assume that the two tangent lines -^^(xa) and ^^(xa) do not intersect 
inside {supp'iljj^k,m)x3 H Qj^p for each X3. In fact, the number of shearlet supports 
i^j.k.m intersecting Qj p n dBi n ■ ■ ■ H dBt, so that there are two tangent lines L^{£3) 
and -^^(xa) meeting each other inside (supp 4'j,k,m)x3 for some Xa, is bounded by some 
constant C independent of scale j. Those shearlets tpj,k,m are covered by Case 2f, and 
we may therefore simply ignore those shearlets in this case. Using a similar argument 
as in the estimate of (8.5), one can then estimate Ij^k,m{x3) defined as in (10.2) as 
follows: 

Ii,k,mix3) < C min < —i > . 

" '=1.2 \ (1 + |/ci + 2^ V4(x3)|)«+i J 

Again, applying similar arguments as in (8.12), we may replace the slopes Si(xa) and 
s\ (xs) by universal choices s\{Q) and s\ (0), respectively. This gives 

I (/, V.,,,„) I < C ^ max^ I } . (10.11) 

where k\ = 5^(0)2-' ^^ + fci for i = 1, . . . , L. Further, applying a similar counting 
argument as in Section 9, for k = (^1,^2) and j >0 fixed, we have 

# k, m)} int(supp Vi.fc.m) n dBi n ■ ■ ■ n BBl n Qj,p ^ 0| 

<C _min^{l + |fcj|}. (10.12) 
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In this case, the number of all dyadic boxes Qj^p containing more than two distinct 

discontinuity surfaces is bounded by some constant independent of scale j, and there 
are [2-'t^'1 + 1 shear indices with respect to k2- Let us define 

Kj{s) = jfci e Z : . max^{(l + \k\\)-^"+^h-i'^} > e| . 

Finally, using (10.11) and (10.12), we see that 

\Ai6)\<C ^ ^mmJl + |fci|}<C£-^. 

This proves Case 2e. 

Case 2f. In this case, since the total number of shear parameters k = (fci, A;2) is 
bounded by a constant times 2^ for each j > 0, it follows that 

#|A,-,p(e)|<C-2^-. 

Since there are only finitely many corner points with its number not depending on 
scale j > 0, we have 

^ logs (e~^) 

#|A(£)|<C- 2^<C-e-^, 

which, in turn, implies the optimal sparse approximation rate for Case 2f. This 
completes the proof of Theorem 6.2. 

11. Extensions. 

11.1. Smoothness parameters a and /3. Our 3D image model class £^{R^) 
depends primarily of the two parameters a and 13. The particular choice of scaling 

matrix is essential for the nearly optimal approximation results in Section 6, but 
any choice of scaling matrix basically only allows us to handle one parameter. This 
of course poses a problem if one seeks optimality results for all a,/3 G (1,2]. We 
remark that our choice of scaling matrix exactly "fits" the smoothness parameter of 
the discontinuity surface a, which exactly is the crucial parameter when /3 > a as 
assumed in our optimal sparsity results. It is unclear whether one can circumvent the 
problem of having "too" many parameters, and thereby prove sparse approximation 
results as in Section 6 for the case /3 < a < 2. 

For a > 2 we can, however, not expect shearlet systems SH{(f), tp, ip, ■ip) to deliver 
optimal sparse approximations. The heuristic argument is as follows. For simplicity 
let us only consider shearlet elements associated with the pyramid pair P. Suppose 
that the discontinuity surface is C^. Locally we can assume the surface will be of the 
form xi = E{x2,X3) with E G C^. Consider a Taylor expansion of E at {x'2,x'^): 



E{X2,X3) = E{x[,x'2) + (5(i.o)i;(:r'i,4) d^°'^^E{x[,x',)) 



, . /5(2.o)£;(ei,6) 5(i^i)i?(a,6)\ (x2^ 



(11.1) 
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Intuitively, we need our shearlet elements ipj,k,m to capture the geometry of dB. For 
the term E(x[.X2) wc use the translation parameter m G to locate the shearlet 
element near the expansion point p := {E{x'i,x'2),X2,x'^). Next, we "rotate" the 
element tpj,k,m using the sharing parameter A; S to align the shearlet normal with 
the normal of the tangent plane of dB in p; the direction of the tangent is of course 
governed by d^^''^^E{xi,X2) and d'^°'^^E{xi,X2). Since the last parameter j G No is 
a multi-scale parameter, we do not have more parameters available to capture the 
geometry of dB. Note that the scaling matrix A23 can, for a = 2, be written as 

/2J \ /2 y 

= 2^/2 0=0 21/2 
\0 2^/2/ \0 2^/2/ 

The shearlet element will therefore have support in a parallelopipcd with side lengths 
, 2~-'/2 and 2~^^'^ in directions of the xi, X2, and 0:3 axis, respectively. Since 

|a;2a;3| < 2^^,xl < and xj < 2~\ 

for 1 , Ircsl < 2~-'/2^ see that the paraboliodal scaling gives shearlet elements of 
a size that exactly fits the Hermitian term in (11.1). If dB e C" for 1 < a < 2, 
that is, i? € C" for 1 < a < 2, we in a similar way see that our choice of scaling 
matrix exactly fits the last term in the corresponding Taylor expansion. Now, if the 
discontinuity surface is smoother than C^, that is, dB e C" for a > 2, say dB G C^, 
we could include one more term in the Taylor expansion (11.1), but we do not have 
any more free parameters to adapt to this increased information. Therefore, we will 
arrive at the same (and now non-optimal) approximation rate as for dB G C^. We 
conclude that for a > 2 wc will need representation systems with not only a directional 
characteristic, but also some type of curvature characteristic. 

For a < 1, we do not have proper directional information about the anisotropic 
discontinuity, in particular, wc do not have a tangential plane at every point on 
the discontinuity surface. This suggests that this kind of anisotropic phenomenon 
should not be investigated with directional representation systems. For the boarder- 
hnc case a = 1, our analysis shows that wavelet systems should be used for sparse 
approximations . 

11.2. Needle-like shearlets. In place of A2J = diag (2"-'/2, 2-5/2, 2-'/2), 
could also use the scaling matrix A2J = diag (2^"/2, 2^"/^, 2^/^) with similar changes 
for A2.7 and . This would lead to needle-like shearlet elements instead of the plate- 
like elements considered in this paper. As Theorem 6.2 in Section 6.1 showed, the 
plate-like shearlet systems are able to deliver almost optimal sparse approximation 
even in the setting of cartoon-like images with certain types of ID singularities. This 
might suggest that needle-like shearlet systems are not necessary, at least not for 
sparse approximation issues. Furthermore, the tiling of the frequency space becomes 
increasingly complicated in the situation of needlc-likc shearlet systems which yields 
frames with less favorable frame constants. However, in non-asymptotic analyses, 
e.g., image separation, a combined needle- like and plate- like shearlet system might be 
useful. 

11.3. Future work. For a < 2, the obtained approximation error rate is only 
near-optimal since it differs by T(a) from the true optimal rate. It is unclear whether 
one can get rid of the T{a) exponent (perhaps replacing it with a poly- log factor) 
by using better estimates in the proofs in Section 8. More general, it is also future 
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work to determine whether shearlet systems with a, /? € (1, 2] provide nearly or truly 
optimal sparse approximations of all / e To answer this question, one would, 

however, need to develop a completely new set of techniques. This would mean that 
the approximation error would decay as 0{N~ m"i{a;/2, 2/3/3}^ as A'' — >■ oo, perhaps with 
additional poly-log factors or a small polynomial factor. 

Acknowledgements. The first and third author acknowledge support from DFG Grant 
SPP-1324, KU 1446/13. The first author also acknowledges support from DFG Grant KU 1446/14. 

Appendix A. Estimates. The following estimates are used repeatedly in Sec- 
tion 5 and follows by direct verification. For t = 2~™, i.e., — log2t = m, to € No := 
NU {0}, we have 

— log2 t 

°« fi 

{i6No:2-J<t} 3=-\og^t 

For t G (0, 1], we have [— log2 t\ G No and therefore 

L- logj tj _ _ 

E (2-^-^= E (2^)^ < V-~2-^^ iovi>Q, (A.l) 

{jeNo:2-J>t} j=0 

E (2-^'= E f°^^>0' (A.2) 

{jeNo:2-3<t} j=[-log2tl 

where we have used that 2L-'°S2tJ < f-i and 2-r-'°e2*l = 2Li°S2«J <t. For t > 1 we 
finally have that 

E (2-^-^=0 and E (2"-')'=E(2"')' = r3^- (^.3) 

{jeNo:2-3>t} {jeNo:2-3<t} j=0 



Appendix B. Proof of Proposition 5.2. We start by estimating r(2w), and 
will use this later to derive the claimed upper estimate for R{c). For brevity we will 
use Kj := [- [2^("-i)/2] , p2^(a-i)/2-|j g j^. ^^^^^ ^^^^^ g gy definition 

it then follows that 



r(2a;i,2w2,2w3) 

< esssup E E k (2-^"/'a, A;i2-^"/2^i + 2-^/2^2, /e22-^"/2^i + 2-^/2^3) 

fsK^ j>o feeKj 

• (2-^"/2a + 2u;i, A;i2-^"/2ei + 2-^V2^2 + 20.2, fc22-^«/25i + 2-^"/2^3 + 20.3) • 

For each (wi, a;2, wa) G M"^ \ {0}, we first split the sum over the index set No into 
index sets Ji = { j > : |2-J"/2^i| < and J2 = {j > : |2-W2^^| > 

We denote these sums by I\ and /2, respectively. In other words, we have that 

r(2a;i,2a;2,2a;3) < esssup(/i + h), (B.l) 

£eM3 
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where 

■ U(2-^"/2^i + 2a;i, fci2-^"/2a + 2-^/2^2 + 2uj2, k22-^"/^^i + 2-^'^^^ + 2^3) 



and 



^2=E E \i'{^-'''''^^M2-^'''^^i+2-i/^^2M2-^'''Hi+2-^/Hz) 
■ I (2-J"/2a + 2u;i, A:i2-^"/2a + 2-^/2^2 + 2u;2, k22-^'''^^^ + 2-^/2^3 + 2^3) 



The next step consists of estimating Ii and I2, but we first introduce some useful 
inequaUties which will be needed later. Recall that 5 > 27 > 6, and q,q',r,s are 
positive constants satisfying q' , r,s G (0, q). Further, let 7" = 7 — 7' for an arbitrarily 
fixed 7' satisfying 1 < 7' < 7 — 2. Let (. > 7 > 3. Then we have the following 
inequalities for x,y,z G K. 

min{l,|ga;|'-}min{l,|ry|-'^} < min{l, min {l, Vt/|-^} , (B.2) 



min{l, |x| '''}min<l, 



l + z 



x + y 



<2'' \y\-^ min{l,|a;|-"' }max{l,|l+zp }, (B.3) 



min{l,|ga;|'-^}min{l,|g'xr^}|xp" < (g')"^" 



(B.4) 



and 

min{ 1 , 1 I '-"^ } min{ l,\q'x\-''}\xp" < {q'y" min{ 1 , | ga; | ''-''+^" } min{ l,\q'x\-'''}. 

(B.S) 

We fix ^ e and start with Ii. By the decay assumptions (4.1) on tp, it follows 
directly that 



h<Yl min{|g2-^«/2^i|^l}min||g'2--''«/2^i|-T,l} 



jeJi 



min||g(2-J"/2a +2wi) 



1 > rnin 



J2 min{|r(fci2-^«/2^i +2-^/2^2)1 min{|r(/ci2-^«/2^i + 2-^/2^2 + 2^2) | 



^ min{|s(fc22-^«/2Ci+2-^/26)| niin{|s(fc22-^«/2a+ 2-^/'6 + 2a;3)| "'}■ 

k2eKj 
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Further, using inequality (B.2) with i = 5 and l = 25 twice, 
h<Yl min{|52-^«/2^i|^-2T,l}min{|g'2-^"/2^i|-^,l} 



|<5-27 



min I k2-^"/2^i + 2wi) , 1 Uin <^ g'(2-^"/2^i + 2wi) 



,1 



min ■ 



(fa+2-.¥&) 



2a;2 
2-i«/ 



2wi 



2-i«/2^i 



min ■ 

2W3 



(fc2+2^'^|) 



,1 



1 + 



2a;i 



2-ia/2^. 

where we, e.g., in the sum over fci, have used paraphrases as 
r(fci2-^«/2a + 2-J/2^2) _ r 



2-i"/2^i 



,U, (B.6) 



g2-J«/2^i 



and 



r(fci2-J"/2ei + 2-^/2^2 + 2w2_) ^ r 

9 



?(2-^«/2a + 2a;i) 



2W2 



2-J«/2^i 
+ (A;i+2^"^|) 



1 + 



2a;i 



We now consider the following three cases: = |cji| > |2 Ht^Hco = 

|w2| > |2--'"/2^i|, and = jwa] > |2--'«/2^i|. Notice that these three cases indeed 

do include all possible relations between w and ^i. 

Case I. We assume that = > |2--''"/2^i|, hence |2--'"/2^i + 2wi | > 

Using the trivial estimates min{|g(2~-'"/2^^ _|_ 2w\)\ , 1} < 1, 

2uji 



2W2 

2-J«/ 



1 + 



2-J«/2^i 



U < 1, 



and analogue estimates for the sum over ^2, we can continue (B.6), 

h<Y^ min{|g2-^«/2ai'^"^^,l}min{|g'2-^"/2ar^,l} |9'(2--''"/^a + 2a;i) 
5^minj ^(/ci + 2^"?^|) V| E™i4 J(^2 + 2^"?^| 
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r 

q 



Our assumption = |wi| implies \q'{2 J"/^^^ +2a;i)| < H^'wHoJ- Therefore, 

(fc,+2^-^i; 



1 > min 



«i:r„J -Ai^ 



r '-^ a 

fciGZ ^ 



9 s . 

- mm • 



By the estimate (5.4) with y = r/q< 1 (and y = s/g < 1) as constant, we can bound 
the sum over fci (and ^2), leading to 

^1 < Ik'^^ll^o'' E { |92-^'"/'6 f , 1} min | \q'2-i'''Hi[^ , l| ^C(7) ^^(7). 

Taking the supremum over ^1 = ?7i/g e K and using equations (A.l) and (A. 2) as in 
the proof of Proposition 5.1 yields 



/i<^C(7ril«'^|loo\sup E^i^ 



2-i"/2 



(5-27 



1 > min 



< 



' C(7)^|k'-|loo^ ( ^log,(^) 



rs 



Q \q' 



1 _ 2-'5+27 



+ 1 . 



(B.7) 



Case II. We now assume that = |w2| > |2--'"/2^i|. For 7 = 7' +7", 



7 > y + 2 > 3, 7' > 1, 7" > 2 by (B.3) 



||^(fci+2^-^|)| \l|min| 



1 I 2a)i 



< 2'^ 



r 2a;2 



, 1 > max < 



1 + 



2a;i 



Applied to (B.6) this yields 

h<Y, min{|g2-^"/2^i|^-2^l}min{|g'2-^"/2ei|-'',l} 



2-J°/2^i 



g(2-^°/2^i + 2wi)f , 1 > mm 



feiGZ 



r 2^9 



- [k, 



+ 2^-^ I) 



1 > max ■ 



^minj ^(fc2 + 2^"?^|) (B.8) 



-7 




,1 


1 




2-J"/2^i 








> . 
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Hence, by estimate (5.4), 

t 
rs 



h < 2'^"^C{^)C{i)\\2^^w\\-J' min{|g2-^«/2^i|*-2^l}min{|g'2-W2^i|-^l} 



• mm 



in||g(2-J«/2,ei+2wi)f ''',l|min| 



^'(2-W2^^+2wi)| \l 

2. 



max • 



1 



2-J«/2^i 



7 



A). (B.9) 



We further split Case II into the following two subcases: 1 < |1 + 2-w2g^ I and 
1 > |1 + 2^J^72^I- Now, in case 1 < |1 + \ , then obviously 



2-j«/2^^ 



max < 1, 



1 + 



2wi 



2-J«/2^i 



< 



2-W2^^+2a;i 



which used in (B.9) yields 



rs 



C(7)C(7')II^«'II^^" E min {|<72-^-«/2ei|*-2^ l} min {\q'2-^"^%r , l} 



je.h 



5(2-W2^^+2a;i) 



(5-27 



1 > min 



g'(2-^"/2^i+2a;i) ,ll 2-^"/^^i + 2uJi 



Hence, by inequality (B.4) with t = 5 — 7, i.e.. 



^(2-i«/2^^ + 2a;i) 



S-2j 



1 > min 



we arrive at 
^2 



h < ^^C{j)C{yW-:fw\U" E niin{|g2-W2^i|^-2^l}min{|5'2-^"/2a|-^l} 



rs 

^2 



<^C7(7)C(Y)IIV«'llc 



'2 



1 



1 _ 2-*+27 



+ 1 



(B.IO) 



On the other hand, if 1 > |1 + ^z^^fr^J, then, 

min {|g(2-^ta + 2uji)f~\ l} min { |g'(2-^ + 2uji)\~\ l} max { |l + ^ T", l} < 1, 
for all j > 0. Hence from (B.9), by employing inequality (B.5), we arrive at 

jeJi 

< f.^c{j)C{j')^wU" E (a')"^" min{\q2-^^i,\'-'-r+^",l} min {|g'2-^"tar^', l} 

^2 



jeJi 



1 



1 _ 2-'5+27-7" 



(B.ll) 
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Case III. This case is similar to Case II and the estimates from Case II hold with 
the obvious modifications. We therefore skip the proof. 

We next estimate 12- First, notice that the inequality (B.6) still holds for I2 with 
the index set Ji replaced by J2. Therefore, we obviously have 



I2<Y1 min{|g2-^«/2^i|'^-2T,l}min{|g'2-^"/2^i|-T,l} 



min ■ 

fciSZ 



(fci + 2^'^|) \ll ^ minJ ' (fc2 + 2^'^| 



,1 , 



by (5.4), 



2 

h < Yl min{|g2-J«/2^i|*-2T,l}min{|g'2-^«/2^i|-T,l} 



rs 



2 ll„/, .l|-7 

00 ' 



(B.12) 



Summarising, using (B.l), (B.7), and (B.12), we have that 



r(2.)<^^(k,(l 

rs 119 ^11-- V vo' 



+ 1 _ 2-'5+27 



1-2-7; 



+ 



whenever \\co\\^ = \coi\, and by (B.IO), (B.ll), and (B.12), 



^ ^ r.s || g m'"{'-.s} „,||7" \n ■'Kg' J 



+ 



1 _ 2-^+27 1 - 2-^+27-7' 



g(7)^ 1 
rs ||g'a;|r 1-2-7' 



.2) 



+ 
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otherwise. We are now ready to prove the claimed estimate for R{c). Define 
Q = {m e : |mi| > |m2| and |mi| > Imsl} , 

and 

Q = {m e : cj^^|mi| > C2 ^|m2| and cj^^|mi| > c^^|?Ti3|} . 
If m e Q, that is, if cj"^|mi| > ^|m2| and cj"^|mi| > c^^|m2|, then 



r(±M-^m) < 



q' C{n f (2c 



rs m 



+ 



+ 



1 _ 2-'5+27 1-2-7/ 

= (Ti+T3)||m|| 



If on the other hand m e Qf^\{Q}, that is, if ^|mi| < C2 ^|?n2| or ^|mi| < C2 ^|?Ti3| 
with m ^ 0, then 



r(±M-iTO) < 
1 



q^C{^)C{i) /2qC2y 



^« l|m||3o 
1 



q'r 



l0g2 



1 



1 



1 _ 2-'5+27 1 - 2-7 



+ 



1 _ 2-'5+27-7" 1 - 2-7' 



rs m 



1 - 2-7 



(T2+T3)|H|3,, 
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Therefore, we obtain 



i?(c)= (r(M-im)r(-M-im)) 

meZ3\{0} 

< f E 2^ilHI^''+^3lH|--^ ) + ( E TMj'+TsWmW 



\meQ'\{0} 



(B.13) 



Notice that, since Q C Q, 



Also, we have 



E iiHr"< E HI 



||m|| < 3min 

meQ<=\{0} 

Therefore, (B.13) can be continued by 



,2[ E HI 

meQ<=\{0} 



R{c)<T, E HII 

meZ3\{0} 



J + Ti E HI 

TO6C 



3min| - ,2\t2 V 



m£Q<=\{0} 



To provide an explicit estimate for the upper bound of i?(c), we compute X^meQ Hllcx7 
and EmeQ=\ ^ follows: 

OO 

E ll^ll^'' = E(24rf' + 2)'^"'' = 24C(7 - 2) + 2C(7) 

meZ-'^\{0} d=l 

where {2d + 1)'^ — {2d + 1)^ = 24d^ + 2 is the number of lattice points in at distance 
d (in max-norm) from origo. Further, 



E HII 

meQ 



2 E (2mi - l)^m7^ = E (Sm?"^ - Smi"^ + 2771^^) 



mi=l 



■mi — 1 



8C(7 - 2) - 4C(7 - 1) + 2C(7) 



and 



E Hlir = 24C(7 - 2) + 2C(7) - (8C(7 - 2) - 4C(7 - 1) + 2C(7)) 

meQ=\{0} 

= 16C(7-2)-4C(7-l), 



which completes the proof. 
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